Using the search engine as recommendation engine

Recommendations from the search engineSesam Hackathon, Warsaw, 2014-03-23

Lars Marius Garshol, larsga@bouvet.no, http://twitter.com/larsga1

This whole presentation is about Ted Dunning’s proposed approach to recommendations

Based on his 1993 paper (below)– references at the end

Very simple method, dead easy to implement– seems to work pretty well

Inspiration

Usually designed as prediction of ratings– Dunning believes this is the wrong approach– people’s ratings don’t necessarily reflect what

they’ll buy– go by what people do rather than what they say

You don’t want to recommend Bob Dylan– everyone’s already heard about him, and know

what they think– you want to recommend things that are new to

the user

You don’t want to recommend things everyone likes– most likely they already know that

Thoughts on recommendations

Step 1– work out which things tend to occur together– that is, if you buy this, you’re likely to also buy

this– however, we only want pairs which are

statistically significant

Step 2– index up the significant pairs in a search engine– use search to produce the actual results

The actual approach

Statistically significant co-occurrencePart the first

... ...

The starting point

Some kind of log of user actions

User has– bought a movie | album | book | ...– opened a document– ...

From this raw material, we can work out what things tend to go together– and whether this is significant

i1 i2 i3 i4 i5 i6 i7

i1 23 42 0 0 5 7

i2 23 6 1 129 2 10

i3 42 6 3 0 492 1

i4 0 1 3 2 3 1

i5 0 129 0 2 94 2

i6 5 2 492 3 94 1

i7 7 10 1 1 2 1

Item-to-item matrix

k[0][0] = the number in the matrix on previous slide

k[0][1] = the sum of that whole column minus k[0][0]

k[1][0] = the sum of that whole row minus k[0][0]

k[1][1] = the sum of the entire matrix minus k[0][0] minus k[1][0] minus k[0][1]

Producing the k 2x2 matrix

How to compute the k matrix for a given cell in the matrixon the previous slide

If the output of LLR(k) is above some threshold, the pair is considered significant.

Check the Python code on– https://github.com/larsga/py-snippets/tree/

master/machine-learning/llr– this requires a lot of memory and CPU

Or just use Mahout– RowSimilarityJob does exactly this

Doing it for real

Search engine as recommenderPart the second

Take all the items and index them up with the search engine in the usual way– that is, each title has an id, a title, a description,

Then, add a “magic” field– put into it the IDs of all the items that appear in

a significant pair with this item– let’s call this field “indicators”

Now we’re ready to do recommendations

Indexing with the search engine

Collect some set of items for which the user has expressed a preference– by buying them, looking at them, rating them,

whatever

The IDs of these items are your query– search the “indicators” field– the search results are your recommendations

That’s it!– pack up, go home

Doing recommendations

Imagine that you’re searching for movies, and you type “the godfather”– “the” appears in all documents, so documents matching

that get a low relevance score– “godfather” appears in very few documents, so matches

on that get a high score– this is basically TF/IDF in a nutshell

Now, imagine you liked two movies: “The Godfather” and “The Daytrippers”– nearly all movies have “The Godfather” as an indicator– very few have “The Daytrippers”– the second will therefore influence recommendations

much more

Why does it work?

Trying it out for realPart the third

Again, the code is on Github– very simple webapp based on web.py and

Lucene– https://github.com/larsga/py-snippets/tree/

master/machine-learning/llr

The underlying data is the MovieLens dataset– 10 million ratings of 10,000 movies by 72,000

users– http://grouplens.org/datasets/movielens/

Real demo with real data

llr.py– this chews the data, producing the significant

pairs– takes huge amount of memory and about 30

minutes– have made absolutely no attempts to optimize it

llr_index.py– reads output of previous script, makes Lucene

recom-ui.py– the actual web application

Three scripts

Liked one movie

Liked two movies

Movies with highest llr scoretogether with this movie

Liked three movies

Recommendations are actually now spot-on. At least for me.

class Movie:

def GET(self, movieid):

nocache()

doc = search.do_query('id', movieid)[0]

#recoms = search.do_query('indicators', movieid)

recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets]

if hasattr(session, 'liked'):

youlike = search.do_query('indicators', session.liked)

youlike = []

return render.movie(doc, recoms, youlike)

Complete code for movie page

Further workWinding up

Tweak the parameters a bit to see what happens

Can we support a “Dislike” button?

Test it with more kinds of data

Learn how to do this with Mahout

Things left to do

What is this?

From Ted Dunning’s slides

And this?

The original 1993 paper– http://citeseerx.ist.psu.edu/viewdoc/summary?

doi=10.1.1.14.5962

Ebook with lots of background but little detail– http://www.mapr.com/practical-machine-learning

Slides covering the same material– www.slideshare.net/tdunning/building-multimodal-

recommendation-engines-using-search-engines

Blog post with actual equations– http://tdunning.blogspot.com/2008/03/surprise-and-

coincidence.html

References

Using the search engine as recommendation engine

Technology

Transcript of Using the search engine as recommendation engine

Recommendation engine

An Analytic Model to Optimize Search Results Using ... · Keywords: Search Engine; Social Search Engine; Real Time Search Engine; Analytic Search Engine Model; Social Rank; Socialytics;

Search Engine Optimization and Search Engine Marketing

Search Engine Marketing: Search Engine Marketing · PDF fileSEO vs. PPC ... Links ... Search engine marketing and social media marketing .....125 Search engine marketing and email

Recommendation Engine Powered by Hadoop

SEO (Search Engine Optimization) vs SEM(Search Engine Marketing)

Search engine advertising - courses.ischool.berkeley.educourses.ischool.berkeley.edu/i141/f05/lectures/search-engine-advertising.pdf · Search engine advertising Hal Varian. SIMS

Using Mahout and a Search Engine for Recommendation

Recommendation Engine Demystified

Recommendation engine for E-commerce

Search Engine

SEARCH ENGINE OPTIMIZATION · 2016-02-06 · SEARCH ENGINE OPTIMIZATION Firman Ardiansyah. 70% dari Search Engine. BUAT SITUS WEB YANG RAMAH PENGGUNA ... Search Engine Friendly URLs

Personalized news recommendation engine

Building a Real-time, Solr-powered Recommendation Engine Trey Grainger Manager, Search Technology Development @ Lucene Revolution 2012 - Boston.

Search engine optimization service, search engine optimization

VP, Product Marketing - Minerva Networks · 2014. 11. 7. · Streaming Media Players Recommendation Engine Advertisement Engine Service Mgmt Search Engine Linear ... Office in virtualized

Website Search Engine Optimization: Geographical and Cultural … · 2014-12-18 · Search Engine Optimization, Web Crawlers, Search Engine Algorithms, Search Engine Visibility, Jordan

Recommendation Engine Project Presentation

Build Your Own Recommendation Engine

Building Search & Recommendation Engines