Prediction io–final 2014-jp-handout

Post on 15-Jul-2015

405 views 2 download

Tags:

Transcript of Prediction io–final 2014-jp-handout

Yuki Furuta Naoto Yamamoto Tran Hoan

Facebook Open Academy International

What is ?An open source machine learning server

For software developers to create predictive features in their web and mobile app.

Currently powering thousands of developers and hundred of applications1

1 http://github.com/PredictionIO/

1

Architecture

Horizontally scalability

Spark

Data Preparator

Model 1

Model N

HBaseQuery

PredictionResultData

SourceImport Data(EventServer)

Algorithm 1

Algorithm N

ServingHDFS

Spark

.

.

.

http://docs.prediction.io/resources/systems/

Web AppMobile App

Productivity

Data In Data Out

2

What can do?

Content-based recommendationTrend detection Sentiment Analysis

Restaurant recommendation User similarity

Data analysisEngine

(recommendation, rank,…)

YELPIO-NAVI3

MovieLens

YELPIO-NAVI

Naoto Yamamoto Tran Hoan

Recommendation App for RestaurantsUsing Yelp! Dataset

Inhwan Eric Lee(JP)(JP)(USA)

What is YELPIO-NAVIYelp:

食べログ in America

Information ofrestaurants’ address, stars

users’ stars

Recommendation of RestaurantsUsing These Information

YELPIO-NAVI Demo Setup

Batch import datathrough RubySDK

Store & RetrieveBusiness Data

Retrieve & StoreBusiness Data

through REST API

Retrieve Prediction Results through REST API

https://github.com/OminiaVincit/predictionio_rails

http://yelpio.hongo.wide.ad.jp/

https://github.com/OminiaVincit/YELPIO_demo2

(1) Neighbourhood model(2) Collaborative Filtering

http://www.yelp.com/

YELPIO-NAVI Demo

http://yelpio.hongo.wide.ad.jp/

7 http://zorovn.hongo.wide.ad.jp/

MovieLensContent-based Movie Recommendation

Yuki FurutaNhu-Quynh Beth Yue ShiShaocong Mo(JP)(USA) (USA) (USA)

x MovieLens- Content-Based Movie Recommendation Engine -

A B

A. Collaborative Filtering

x MovieLens

MovieLens Datasets• 100,000 ratings (1-5)

from 943 users on 1682 movies

• Simple demographic info for the users (age, gender, occupation, zip)

• Information about the movies (title, release date, genre)

- Content-Based Movie Recommendation Engine -B. Content-Based

A (age: 20, male, RUS) B (age: 21, male, KZH)

20-year-old man likes:• Action 60%• Comedy 10%• English 10%• etc.

10

x MovieLens- Content-Based Movie Recommendation Engine -

Datasetval DataSourceAttributeNames = AttributeNames( user = "pio_user", item = "pio_item", u2iActions = Set("rate"), itypes = "pio_itypes", starttime = "pio_starttime", endtime = "pio_endtime", inactive = "pio_inactive", rating = "pio_rating")

Feature Based

User Based

Algorithms

PreparationReading DataQuery

Serve

MovieLens - User (ID, Age, Gender, Occupation, Zip) - Movie (ID, Title, Year, Genre, Actors,…)

Prepare Train

11

x MovieLens- Content-Based Movie Recommendation Engine -

Stanlay KubricksAmericaComedy

BlackSF

Rowan AtkinsonUnited Kingdom

ComedySF

Action

Feature Based Algorithm

Michael

12

x MovieLens- Content-Based Movie Recommendation Engine -

Stanlay KubricksUSA

ComedyBlack

ScientificFantasy

Rowan AtkinsonUnited Kingdom

ComedySF

ActionFantasy

ComedyFantasyActionUSA

Mark WahlbergUSA

ComedyFantasyAction

Recommend!

Feature Based Algorithm

Michael

13

x MovieLens- Content-Based Movie Recommendation Engine -Feature Based Algorithm

UserID: 1, Age: 24, Gender: M, Occupation: technician, Zip: 85711 UserID: 2, Age: 53, Gender: F, Occupation: other, Zip: 94043 UserID: 3, Age: 23, Gender: M, Occupation: writer, Zip: 32067 UserID: 4, Age: 24, Gender: M, Occupation: technician, Zip: 43537 UserID: 5, Age: 33, Gender: F, Occupation: other, Zip: 15213

User: 196 rates Movie: 242 (3.0 / 5) User: 186 rates Movie: 302 (3.0 / 5) User: 22 rates Movie: 377 (1.0 / 5) User: 244 rates Movie: 51 (2.0 / 5) User: 166 rates Movie: 346 (1.0 / 5)

Threshold (e.g. 2.0)BUY BUY - - -

Train

Querye.g. Recommend 5 movies for UserID: 2 Recommend 5 movies which are “Comedy” for UserID:2 Recommend 2 movies which are “Action” by Rowan Atkinson for UserID: 2

1. MovieID: 297 Score: -8.53295620539528 2. MovieID: 251 Score: -13.326537513274323 3. MovieID: 292 Score: -15.276804370241758 4. MovieID: 290 Score: -32.944167483781335 5. MovieID: 314 Score: -37.45527366828404

Predict

14

…to be continued

Scale for Big Data

Multi-engines & Multi-algorithms

Predict with more features

15

Evaluation

Thank you for listening

Japanese team

Yuki Furuta Naoto Yamamoto Tran Hoan