My talk about recommendation and search to the Hive

Introduction to Mahout

Topic For This Section

• What is recommendation?• What makes it different?• What is multi-model recommendation?• How can I build it using common household

items?

Oh … Also This

• Detailed break-down of a recommendation system running with Mahout on MapR

• With code examples

I may have to summarize

just a bit

Part 1:5 minutes of background

Part 2:5 minutes: I want a pony

Part 1:5 minutes of background

What Does Machine Learning Look Like?

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality

But tonight we’re going to show you how to keep it simple yet powerful…

Recommendations as Machine Learning• Recommendation:

– Involves observation of interactions between people taking action (users) and items for input data to the recommender model

– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;

suggesting sale items for e-stores or via cash-register receipts

Part 2:How recommenders work

(I still want a pony)

Recommendations

Recap:Behavior of a crowd helps us understand what individuals will do

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Charles

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Charles

Recommendations

What else would Bob like?

Charles

Recommendations

A puppy, of course!

Charles

You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)

Recommendations

What if everybody gets a pony?

Charles

Amelia What else would you recommend for Amelia?

Recommendations

Charles

AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...

Problems with Raw Co-occurrence

• Very popular items co-occur with everything (it’s doesn’t help that everybody wants a pony…)– Examples: Welcome document; Elevator music

• Widespread occurrence is not interesting– Unless you want to offer an item that is constantly

desired, such as razor blades (or ponies)• What we want is anomalous co-occurrence

– This is the source of interesting indicators of preference on which to base recommendation

Get Useful Indicators from Behaviors

• Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse

compared to all potential combinations• Transform to a co-occurrence matrix of items x items• Look for useful co-occurrence by looking for

anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which

co-occurrences can with confidence be used as indicators of preference

– RowSimilarityJob in Apache Mahout uses LLR

Log Files

Charles

Log Files

History Matrix: Users by Items

Charles

✔ ✔ ✔✔ ✔

✔ ✔

Co-occurrence Matrix: Items by Items

1 21 1

How do you tell which co-occurrences are useful?.

Co-occurrence Matrix: Items by Items

1 21 1

Use LLR test to turn co-occurrence into indicators…

Co-occurrence Binary Matrix

Spot the Anomaly

A not A

B 13 1000

not B 1000 100,000

A not A

not B 0 2

A not A

not B 0 10,000

A not A

B 10 0

not B 0 100,000

What conclusion do you draw from each situation?

Spot the Anomaly

• Root LLR is roughly like standard deviations• In Apache Mahout, RowSimilarityJob uses LLR

A not A

B 13 1000

not B 1000 100,000

A not A

not B 0 2

A not A

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.90 1.95

4.52 14.3

What conclusion do you draw from each situation?

Co-occurrence Matrix

1 21 1

Recap: Use LLR test to turn co-occurrence into indicators

Indicator Matrix: Anomalous Co-Occurrence

✔✔

Result: The marked row will be added to the indicator field in the item document…

Indicator Matrix

✔id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.

Note: the indicator field is added directly to meta-data for a document in the Solr index. No need to create a separate index for indicators.

Internals of the Recommender Engine

Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?

Recommendation is “1710 : Chuck Berry”

Real-time recommendation query and results: Evaluation

Search-based Recommendations

• Sample document– Merchant Id– Field for text

description– Phone– Address– Location

• Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

• Sample query– Current location– Recent merchant

descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted

offers– Local top40

Original data and meta-data

Derived from cooccurrence and cross-occurrence analysis

Recommendation query

For example

• Users enter queries (A)– (actor = user, item=query)

• Users view videos (B)– (actor = user, item=video)

• ATA gives query recommendation– “did you mean to ask for”

• BTB gives video recommendation– “you might like these videos”

The punch-line

• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

Real-life example

• Query: “Paco de Lucia”• Conventional meta-data search results:

– “hombres de paco” times 400– not much else

• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Real-life example

Hypothetical Example

• Want a navigational ontology?• Just put labels on a web page with traffic

– This gives A = users x label clicks• Remember viewing history

– This gives B = users x items• Cross recommend

– B’A = label to item mapping• After several users click, results are whatever

users think they should be

Nice. But we can do better?

A Quick Simplification

• Users who do h (a vector of things a user has done)

• Also do r User-centric recommendations(transpose translates back to things)

Item-centric recommendations(change the order of operations)

A translates things into users

Symmetry Gives Cross Recommentations

Conventional recommendations with off-line learningCross recommendations

things

thingtype 1

thingtype 2

Bonus Round:

When worse is better

The Real Issues After First Production

• Exploration• Diversity• Speed

• Not the last fraction of a percent

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

“Made more difference than any other change”

Why Dithering Works

Real-time recommender

Overnight training

Log Files

Exploring The Second Page

Simple Dithering Algorithm

• Synthetic score from log rank plus Gaussian

• Pick noise scale to provide desired level of mixing

• Typically

• Also… use floor(t/T) as seed

Example … ε = 2

Lesson:Exploration is good

Part 3:What about that worked example?

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Analyze with Map-Reduce

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Deploy with Conventional Search System

My talk about recommendation and search to the Hive

Technology

Transcript of My talk about recommendation and search to the Hive

Hive Plans.pdf

Hive Research Lab Interim Brief › 2014 › 04 › hive-researc… · Hive Research Lab Interim Brief Mapping Social Learning Ecologies of Hive Youth April 2014 ... • Anthony had

-HIVE- Hive Insulation Valuation Experiment

Hadoop hive

Hive Global

Integrating Apache Hive with Kafka, Spark, and BI...Community Connection: Integrating Apache Hive with Apache Spark--Hive Warehouse Connector Apache Spark-Apache Hive connection configuration

HIVE HIVE HIVE HIVE

HOL - Hive

Aloha Hive BUZZ · Hive, oh Hive Never so alive Oh how I love Hive Oh Hive, Oh Hive Menehune and goats Counselors and boats Oh how I love you Oh camp, oh camp High five for Hive Oh

Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture

Indexed Hive

The Big Data and Business Intelligence Talk · 2015-10-16 · The Big Data and Business Intelligence Talk ... -- Hadoop HDFS & HIVE PI Integrator for SAP HANA Wave 2 ... • Linux

Hive Intro

Building a Bee Hive: The Hive Stand - Michigan bees€¦ · Building a Bee Hive: The Hive Stand ©by Stephen E. Tilmann The Hive Stand 1 Typical Hive Components (this project highlighted

© Hive Studios 2011 Ivan Pavlović, Hive Studios Visual C# MVP, MCT, CSM paki@hive-studios.com .

Hive Inspection Sheet - SABAsababeekeepers.com/files/Hive-Inspection-Sheet.pdf · O Split hive (new hive # O Swarming imminent — needs monitoring EXCESSIVE DRONE CELLS O No O Drone

Toward Robust Recommendation Systems for Scholarly Papers …sugiyama/papers/Talk... · 2014. 5. 14. · Toward Robust Recommendation Systems for Scholarly Papers and Mobile Apps

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East talk by DB Tsai

Issuu Talk on Topic Models and Recommendation Systems

THE HIVE@MANSFIELD - Stopford Associates · THE HIVE@MANSFIELD About us The Hive@Mansfield is part of the successful Hive at Nottingham Trent University. The Hive helps and supports