My talk about recommendation and search to the Hive

67
© MapR Technologies, confidential © MapR Technologies, confidential Introduction to Mahout

description

Recommendation is really valuable and much easier to implement than most people think. Here's how.

Transcript of My talk about recommendation and search to the Hive

Page 1: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Introduction to Mahout

Page 2: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Topic For This Section

• What is recommendation?• What makes it different?• What is multi-model recommendation?• How can I build it using common household

items?

Page 3: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Oh … Also This

• Detailed break-down of a recommendation system running with Mahout on MapR

• With code examples

Page 4: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

I may have to summarize

Page 5: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

I may have to summarize

just a bit

Page 6: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 1:5 minutes of background

Page 7: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 2:5 minutes: I want a pony

Page 8: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 9: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 1:5 minutes of background

Page 10: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

What Does Machine Learning Look Like?

Page 11: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

What Does Machine Learning Look Like?

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality

But tonight we’re going to show you how to keep it simple yet powerful…

Page 12: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Recommendations as Machine Learning• Recommendation:

– Involves observation of interactions between people taking action (users) and items for input data to the recommender model

– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;

suggesting sale items for e-stores or via cash-register receipts

Page 13: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 14: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 15: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 2:How recommenders work

(I still want a pony)

Page 16: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Recap:Behavior of a crowd helps us understand what individuals will do

Page 17: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Alice

Charles

Page 18: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Alice

Bob

Charles

Page 19: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

What else would Bob like?

?

Alice

Bob

Charles

Page 20: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

A puppy, of course!

Alice

Bob

Charles

Page 21: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)

Page 22: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

What if everybody gets a pony?

?

Alice

Bob

Charles

Amelia What else would you recommend for Amelia?

Page 23: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

?

Alice

Bob

Charles

AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...

Page 24: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Problems with Raw Co-occurrence

• Very popular items co-occur with everything (it’s doesn’t help that everybody wants a pony…)– Examples: Welcome document; Elevator music

• Widespread occurrence is not interesting– Unless you want to offer an item that is constantly

desired, such as razor blades (or ponies)• What we want is anomalous co-occurrence

– This is the source of interesting indicators of preference on which to base recommendation

Page 25: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Get Useful Indicators from Behaviors

• Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse

compared to all potential combinations• Transform to a co-occurrence matrix of items x items• Look for useful co-occurrence by looking for

anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which

co-occurrences can with confidence be used as indicators of preference

– RowSimilarityJob in Apache Mahout uses LLR

Page 26: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Log Files

Alice

Bob

Charles

Alice

Bob

Charles

Alice

Page 27: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Log Files

u1

u3

u2

u1

u3

u2

u1

t1

t4

t3

t2

t3

t3

t1

Page 28: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

History Matrix: Users by Items

Alice

Bob

Charles

✔ ✔ ✔✔ ✔

✔ ✔

Page 29: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix: Items by Items

-

1 21 1

1

12 1

How do you tell which co-occurrences are useful?.

00

0 0

Page 30: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix: Items by Items

-

1 21 1

1

12 1

Use LLR test to turn co-occurrence into indicators…

00

0 0

Page 31: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Binary Matrix

11not

not

1

Page 32: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Spot the Anomaly

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

What conclusion do you draw from each situation?

Page 33: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Spot the Anomaly

• Root LLR is roughly like standard deviations• In Apache Mahout, RowSimilarityJob uses LLR

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.90 1.95

4.52 14.3

What conclusion do you draw from each situation?

Page 34: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix

-

1 21 1

1

12 1

Recap: Use LLR test to turn co-occurrence into indicators

00

0 0

Page 35: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Indicator Matrix: Anomalous Co-Occurrence

✔✔

Result: The marked row will be added to the indicator field in the item document…

Page 36: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Indicator Matrix

✔id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.

Note: the indicator field is added directly to meta-data for a document in the Solr index. No need to create a separate index for indicators.

Page 37: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Internals of the Recommender Engine

37

Page 38: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Internals of the Recommender Engine

38

Page 39: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?

Recommendation is “1710 : Chuck Berry”

39

Real-time recommendation query and results: Evaluation

Page 40: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text

description– Phone– Address– Location

Page 41: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Page 42: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

Page 43: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

Original data and meta-data

Derived from cooccurrence and cross-occurrence analysis

Recommendation query

Page 44: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

For example

• Users enter queries (A)– (actor = user, item=query)

• Users view videos (B)– (actor = user, item=video)

• ATA gives query recommendation– “did you mean to ask for”

• BTB gives video recommendation– “you might like these videos”

Page 45: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

The punch-line

• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

Page 46: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Real-life example

• Query: “Paco de Lucia”• Conventional meta-data search results:

– “hombres de paco” times 400– not much else

• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 47: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Real-life example

Page 48: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Hypothetical Example

• Want a navigational ontology?• Just put labels on a web page with traffic

– This gives A = users x label clicks• Remember viewing history

– This gives B = users x items• Cross recommend

– B’A = label to item mapping• After several users click, results are whatever

users think they should be

Page 49: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Nice. But we can do better?

Page 50: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

A Quick Simplification

• Users who do h (a vector of things a user has done)

• Also do r User-centric recommendations(transpose translates back to things)

Item-centric recommendations(change the order of operations)

A translates things into users

Page 51: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Symmetry Gives Cross Recommentations

Conventional recommendations with off-line learningCross recommendations

Page 52: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

users

things

Page 53: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

users

thingtype 1

thingtype 2

Page 54: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 55: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Bonus Round:

When worse is better

Page 56: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

The Real Issues After First Production

• Exploration• Diversity• Speed

• Not the last fraction of a percent

Page 57: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

Page 58: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

“Made more difference than any other change”

Page 59: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Why Dithering Works

Real-time recommender

Overnight training

Log Files

Page 60: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Exploring The Second Page

Page 61: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Simple Dithering Algorithm

• Synthetic score from log rank plus Gaussian

• Pick noise scale to provide desired level of mixing

• Typically

• Also… use floor(t/T) as seed

Page 62: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Example … ε = 2

Page 63: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Lesson:Exploration is good

Page 64: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 3:What about that worked example?

Page 65: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Analyze with Map-Reduce

Page 66: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Deploy with Conventional Search System

Page 67: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential