Using the search engine as recommendation engine

29
Recommendations from the search engine Sesam Hackathon, Warsaw, 2014-03-23 Lars Marius Garshol, [email protected], http://twitter.com/larsga 1

description

Describes a simple approach to using the search engine to drive recommendations.

Transcript of Using the search engine as recommendation engine

Page 1: Using the search engine as recommendation engine

Recommendations from the search engineSesam Hackathon, Warsaw, 2014-03-23

Lars Marius Garshol, [email protected], http://twitter.com/larsga1

Page 2: Using the search engine as recommendation engine

2

This whole presentation is about Ted Dunning’s proposed approach to recommendations

Based on his 1993 paper (below)– references at the end

Very simple method, dead easy to implement– seems to work pretty well

Inspiration

Page 3: Using the search engine as recommendation engine

3

Usually designed as prediction of ratings– Dunning believes this is the wrong approach– people’s ratings don’t necessarily reflect what

they’ll buy– go by what people do rather than what they say

You don’t want to recommend Bob Dylan– everyone’s already heard about him, and know

what they think– you want to recommend things that are new to

the user

You don’t want to recommend things everyone likes– most likely they already know that

Thoughts on recommendations

Page 4: Using the search engine as recommendation engine

4

Step 1– work out which things tend to occur together– that is, if you buy this, you’re likely to also buy

this– however, we only want pairs which are

statistically significant

Step 2– index up the significant pairs in a search engine– use search to produce the actual results

The actual approach

Page 5: Using the search engine as recommendation engine

Statistically significant co-occurrencePart the first

Page 6: Using the search engine as recommendation engine

User

Item

u1 i1

u1 i2

u2 i1

u3 i2

u3 i3

u3 i4

... ...

The starting point

Some kind of log of user actions

User has– bought a movie | album | book | ...– opened a document– ...

From this raw material, we can work out what things tend to go together– and whether this is significant

Page 7: Using the search engine as recommendation engine

7

Page 8: Using the search engine as recommendation engine

8

i1 i2 i3 i4 i5 i6 i7

i1 23 42 0 0 5 7

i2 23 6 1 129 2 10

i3 42 6 3 0 492 1

i4 0 1 3 2 3 1

i5 0 129 0 2 94 2

i6 5 2 492 3 94 1

i7 7 10 1 1 2 1

Item-to-item matrix

Page 9: Using the search engine as recommendation engine

9

k[0][0] = the number in the matrix on previous slide

k[0][1] = the sum of that whole column minus k[0][0]

k[1][0] = the sum of that whole row minus k[0][0]

k[1][1] = the sum of the entire matrix minus k[0][0] minus k[1][0] minus k[0][1]

Producing the k 2x2 matrix

How to compute the k matrix for a given cell in the matrixon the previous slide

If the output of LLR(k) is above some threshold, the pair is considered significant.

Page 10: Using the search engine as recommendation engine

10

Check the Python code on– https://github.com/larsga/py-snippets/tree/

master/machine-learning/llr– this requires a lot of memory and CPU

Or just use Mahout– RowSimilarityJob does exactly this

Doing it for real

Page 11: Using the search engine as recommendation engine

Search engine as recommenderPart the second

Page 12: Using the search engine as recommendation engine

12

Take all the items and index them up with the search engine in the usual way– that is, each title has an id, a title, a description,

etc

Then, add a “magic” field– put into it the IDs of all the items that appear in

a significant pair with this item– let’s call this field “indicators”

Now we’re ready to do recommendations

Indexing with the search engine

Page 13: Using the search engine as recommendation engine

13

Collect some set of items for which the user has expressed a preference– by buying them, looking at them, rating them,

whatever

The IDs of these items are your query– search the “indicators” field– the search results are your recommendations

That’s it!– pack up, go home

Doing recommendations

Page 14: Using the search engine as recommendation engine

14

Imagine that you’re searching for movies, and you type “the godfather”– “the” appears in all documents, so documents matching

that get a low relevance score– “godfather” appears in very few documents, so matches

on that get a high score– this is basically TF/IDF in a nutshell

Now, imagine you liked two movies: “The Godfather” and “The Daytrippers”– nearly all movies have “The Godfather” as an indicator– very few have “The Daytrippers”– the second will therefore influence recommendations

much more

Why does it work?

Page 15: Using the search engine as recommendation engine

Trying it out for realPart the third

Page 16: Using the search engine as recommendation engine

16

Again, the code is on Github– very simple webapp based on web.py and

Lucene– https://github.com/larsga/py-snippets/tree/

master/machine-learning/llr

The underlying data is the MovieLens dataset– 10 million ratings of 10,000 movies by 72,000

users– http://grouplens.org/datasets/movielens/

Real demo with real data

Page 17: Using the search engine as recommendation engine

17

llr.py– this chews the data, producing the significant

pairs– takes huge amount of memory and about 30

minutes– have made absolutely no attempts to optimize it

llr_index.py– reads output of previous script, makes Lucene

index

recom-ui.py– the actual web application

Three scripts

Page 18: Using the search engine as recommendation engine

18

Page 19: Using the search engine as recommendation engine

19

Page 20: Using the search engine as recommendation engine

20

Liked one movie

Page 21: Using the search engine as recommendation engine

21

Liked two movies

Movies with highest llr scoretogether with this movie

Page 22: Using the search engine as recommendation engine

22

Liked three movies

Recommendations are actually now spot-on. At least for me.

Page 23: Using the search engine as recommendation engine

23

class Movie:

def GET(self, movieid):

nocache()

doc = search.do_query('id', movieid)[0]

#recoms = search.do_query('indicators', movieid)

recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets]

if hasattr(session, 'liked'):

youlike = search.do_query('indicators', session.liked)

else:

youlike = []

return render.movie(doc, recoms, youlike)

Complete code for movie page

Page 24: Using the search engine as recommendation engine

Further workWinding up

Page 25: Using the search engine as recommendation engine

25

Tweak the parameters a bit to see what happens

Can we support a “Dislike” button?

Test it with more kinds of data

Learn how to do this with Mahout

Things left to do

Page 26: Using the search engine as recommendation engine

26

What is this?

From Ted Dunning’s slides

Page 27: Using the search engine as recommendation engine

27

And this?

From Ted Dunning’s slides

Page 28: Using the search engine as recommendation engine

28

And this?

From Ted Dunning’s slides

Page 29: Using the search engine as recommendation engine

29

The original 1993 paper– http://citeseerx.ist.psu.edu/viewdoc/summary?

doi=10.1.1.14.5962

Ebook with lots of background but little detail– http://www.mapr.com/practical-machine-learning

Slides covering the same material– www.slideshare.net/tdunning/building-multimodal-

recommendation-engines-using-search-engines

Blog post with actual equations– http://tdunning.blogspot.com/2008/03/surprise-and-

coincidence.html

References