Download - My talk about recommendation and search to the Hive

Transcript
Page 1: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Introduction to Mahout

Page 2: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Topic For This Section

• What is recommendation?• What makes it different?• What is multi-model recommendation?• How can I build it using common household

items?

Page 3: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Oh … Also This

• Detailed break-down of a recommendation system running with Mahout on MapR

• With code examples

Page 4: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

I may have to summarize

Page 5: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

I may have to summarize

just a bit

Page 6: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 1:5 minutes of background

Page 7: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 2:5 minutes: I want a pony

Page 8: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 9: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 1:5 minutes of background

Page 10: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

What Does Machine Learning Look Like?

Page 11: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

What Does Machine Learning Look Like?

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality

But tonight we’re going to show you how to keep it simple yet powerful…

Page 12: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Recommendations as Machine Learning• Recommendation:

– Involves observation of interactions between people taking action (users) and items for input data to the recommender model

– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;

suggesting sale items for e-stores or via cash-register receipts

Page 13: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 14: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 15: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 2:How recommenders work

(I still want a pony)

Page 16: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Recap:Behavior of a crowd helps us understand what individuals will do

Page 17: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Alice

Charles

Page 18: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Alice

Bob

Charles

Page 19: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

What else would Bob like?

?

Alice

Bob

Charles

Page 20: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

A puppy, of course!

Alice

Bob

Charles

Page 21: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)

Page 22: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

What if everybody gets a pony?

?

Alice

Bob

Charles

Amelia What else would you recommend for Amelia?

Page 23: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Recommendations

?

Alice

Bob

Charles

AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...

Page 24: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Problems with Raw Co-occurrence

• Very popular items co-occur with everything (it’s doesn’t help that everybody wants a pony…)– Examples: Welcome document; Elevator music

• Widespread occurrence is not interesting– Unless you want to offer an item that is constantly

desired, such as razor blades (or ponies)• What we want is anomalous co-occurrence

– This is the source of interesting indicators of preference on which to base recommendation

Page 25: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Get Useful Indicators from Behaviors

• Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse

compared to all potential combinations• Transform to a co-occurrence matrix of items x items• Look for useful co-occurrence by looking for

anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which

co-occurrences can with confidence be used as indicators of preference

– RowSimilarityJob in Apache Mahout uses LLR

Page 26: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Log Files

Alice

Bob

Charles

Alice

Bob

Charles

Alice

Page 27: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Log Files

u1

u3

u2

u1

u3

u2

u1

t1

t4

t3

t2

t3

t3

t1

Page 28: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

History Matrix: Users by Items

Alice

Bob

Charles

✔ ✔ ✔✔ ✔

✔ ✔

Page 29: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix: Items by Items

-

1 21 1

1

12 1

How do you tell which co-occurrences are useful?.

00

0 0

Page 30: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix: Items by Items

-

1 21 1

1

12 1

Use LLR test to turn co-occurrence into indicators…

00

0 0

Page 31: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Binary Matrix

11not

not

1

Page 32: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Spot the Anomaly

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

What conclusion do you draw from each situation?

Page 33: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Spot the Anomaly

• Root LLR is roughly like standard deviations• In Apache Mahout, RowSimilarityJob uses LLR

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.90 1.95

4.52 14.3

What conclusion do you draw from each situation?

Page 34: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Co-occurrence Matrix

-

1 21 1

1

12 1

Recap: Use LLR test to turn co-occurrence into indicators

00

0 0

Page 35: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Indicator Matrix: Anomalous Co-Occurrence

✔✔

Result: The marked row will be added to the indicator field in the item document…

Page 36: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Indicator Matrix

✔id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.

Note: the indicator field is added directly to meta-data for a document in the Solr index. No need to create a separate index for indicators.

Page 37: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Internals of the Recommender Engine

37

Page 38: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Internals of the Recommender Engine

38

Page 39: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?

Recommendation is “1710 : Chuck Berry”

39

Real-time recommendation query and results: Evaluation

Page 40: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text

description– Phone– Address– Location

Page 41: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Page 42: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

Page 43: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Search-based Recommendations

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

Original data and meta-data

Derived from cooccurrence and cross-occurrence analysis

Recommendation query

Page 44: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

For example

• Users enter queries (A)– (actor = user, item=query)

• Users view videos (B)– (actor = user, item=video)

• ATA gives query recommendation– “did you mean to ask for”

• BTB gives video recommendation– “you might like these videos”

Page 45: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

The punch-line

• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

Page 46: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Real-life example

• Query: “Paco de Lucia”• Conventional meta-data search results:

– “hombres de paco” times 400– not much else

• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 47: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Real-life example

Page 48: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Hypothetical Example

• Want a navigational ontology?• Just put labels on a web page with traffic

– This gives A = users x label clicks• Remember viewing history

– This gives B = users x items• Cross recommend

– B’A = label to item mapping• After several users click, results are whatever

users think they should be

Page 49: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Nice. But we can do better?

Page 50: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

A Quick Simplification

• Users who do h (a vector of things a user has done)

• Also do r User-centric recommendations(transpose translates back to things)

Item-centric recommendations(change the order of operations)

A translates things into users

Page 51: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Symmetry Gives Cross Recommentations

Conventional recommendations with off-line learningCross recommendations

Page 52: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

users

things

Page 53: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

users

thingtype 1

thingtype 2

Page 54: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Page 55: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Bonus Round:

When worse is better

Page 56: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

The Real Issues After First Production

• Exploration• Diversity• Speed

• Not the last fraction of a percent

Page 57: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

Page 58: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

“Made more difference than any other change”

Page 59: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Why Dithering Works

Real-time recommender

Overnight training

Log Files

Page 60: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Exploring The Second Page

Page 61: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential

Simple Dithering Algorithm

• Synthetic score from log rank plus Gaussian

• Pick noise scale to provide desired level of mixing

• Typically

• Also… use floor(t/T) as seed

Page 62: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Example … ε = 2

Page 63: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Lesson:Exploration is good

Page 64: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

Part 3:What about that worked example?

Page 65: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Analyze with Map-Reduce

Page 66: My talk about recommendation and search to the Hive

© MapR Technologies, confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Deploy with Conventional Search System

Page 67: My talk about recommendation and search to the Hive

© MapR Technologies, confidential © MapR Technologies, confidential