Lets eat presentation_final_20160521

21
Let’s Eat! Brad Binder, Lesley Chapman, Jon Froiland, David Lee

Transcript of Lets eat presentation_final_20160521

Page 1: Lets eat presentation_final_20160521

Let’s Eat!Brad Binder, Lesley Chapman,Jon Froiland, David Lee

Page 2: Lets eat presentation_final_20160521

Introduction

History:

Since 1979 there have been services that review

and rank restaurants (Zagat)

Today:

According to Nielson – Americans have on

average 41 apps on their smartphones, many of

which provide a recommendation service

Page 3: Lets eat presentation_final_20160521

Introduction

A variety of restaurant recommendation apps

have been created

Features include: find restaurants, make reservations,

and healthy options

A Restaurant Recommender would aim to help

users save money, time, and could help cure

buyers remorse

Page 4: Lets eat presentation_final_20160521

Problem Summary

We need a tool that resolves the challenge of

finding a restaurant in your area based upon

specific cuisine and menu item criteria

entered by the user

Page 5: Lets eat presentation_final_20160521

Hypothesis

Hypothesis: The Restaurant Recommender will recommend a

more accurate restaurant compared to selecting a restaurant

based on chance alone

Ho (null hypothesis): A user will find a restaurant that they like

based on chance alone

HA(alternative hypothesis): The restaurant recommender app

will provide a better restaurant suggestion to the user compared

chance alone

Page 6: Lets eat presentation_final_20160521

Data Ingestion

• WORM Storage–Stored HTML menu pages in one location which could be read many times

• Parsed HTML with BeautifulSoup

–Built out a list of “Restaurant” objects

• GET requests to WMATA API to pull metro station data

–JSON data parsed with pandas read_json() function

Ingestion Wrangling Analysis Modeling Visualization

Page 7: Lets eat presentation_final_20160521

Wrangling and Munging

• Majority of time spent wrangling the data and building restaurants–Removing duplicate and incomplete records–Standardizing inconsistent fields (e.g. price)–Aggregating and grouping–Data types

• Merged restaurant and WMATA data using Euclidean distance

Ingestion Wrangling Analysis Modeling Visualization

Page 8: Lets eat presentation_final_20160521

Data Overview

Ingestion Wrangling Analysis Modeling Visualization

964 Total Restaurants115,517 Total Menu Items

• Restaurant data includes:–Name–Location (address, latitude, longitude)–Type of cuisine–Menu (item, price, description)

• WMATA data includes:

–Station name

–Location (latitude, longitude)

–Metro Line

Page 9: Lets eat presentation_final_20160521

Analysis

Ingestion Wrangling Analysis Modeling Visualization

10 cities964 Restaurants

115,517 Menu Items

Page 10: Lets eat presentation_final_20160521

Analysis

Ingestion Wrangling Analysis Modeling Visualization

964 Restaurants115,517 Menu Items

Page 11: Lets eat presentation_final_20160521

Washington, D.C.

Ingestion Wrangling Analysis Modeling Visualization

Page 12: Lets eat presentation_final_20160521

Washington, D.C.

Ingestion Wrangling Analysis Modeling Visualization

Page 13: Lets eat presentation_final_20160521

Feature Selection

• Four feature extraction pipelines using sklearn–Chunking–Cuisine Type

• TfidfVectorizer

–Extract keywords and assign significance score

– Tokenize and chunk parts of speech using nltk

• LabelBinarizer

–Convert cuisine types to binary features

• FeatureUnion

Ingestion Wrangling Analysis Modeling Visualization

Page 14: Lets eat presentation_final_20160521

Modeling and Prediction

• Transformation pipelines and transformed feature vectors pickled

• Kmeans models fitted using training restaurant data, then pickled

• User inputs entered via Flask are stored as training instance

• Relevant pipeline and model loaded to transform and predict

Ingestion Wrangling Analysis Modeling Visualization

Page 15: Lets eat presentation_final_20160521

K=15

Ingestion Wrangling Analysis Modeling Visualization

Page 16: Lets eat presentation_final_20160521

Ingestion Wrangling Analysis Modeling Visualization

Reporting and Visualization

• Restaurant recommendations are determined by similarity within a matched cluster–“Similarity” is calculated by minimizing sklearn’spairwise euclidean distance function between the test data and the training instances in the feature space

• Predictions are exported into an interactive Tableau visualization

–Allows the user flexibility in making a selection through filtering and visual indicators

Page 17: Lets eat presentation_final_20160521

Demo

Page 18: Lets eat presentation_final_20160521

Results

• Some predictions are good, others not so good–Some clusters still contain a “hodge podge”

• Removing the “cuisine type” feature helped to eliminate what we saw as overfit

• Different k values saw better results in some cases, worse in others

• Additional features (price, ratings, metro) would require more clusters and MORE DATA

Page 19: Lets eat presentation_final_20160521

Conclusions

• More data over a “better” model• Might improve results using transformations

like Singular Value Decomposition (SVD) or Latent Dirichlet Allocation (LDA)– Better model analysis

• With more data, improve our tokenizer– Incorporate stemming, improve chunking

• Incorporating user feedback into prediction model (ex: Flask interface)

Page 20: Lets eat presentation_final_20160521

Additional Opportunities

• “Waiter-caller” function that would allow users to login, use the restaurant map search function, click on a restaurant, and be matched up with menu items based on keyword matches. As opposed to reading through an entire menu to find relevant items.

–Required more knowledge and implementation of javascript, css, and jinja into the Flask environment.

• Sentiment analyzer was developed but not integrated. Would allow users to go to restaurant and input a review. The review would then be analyzed giving back a recommended score (1-5) to the user.

–Similar requirements

Page 21: Lets eat presentation_final_20160521

Sources• Downey, Allen B. Think Bayes. O’Reilly Media; 1st Edition. 2013. Paperback.• Downey, Allen B. Think Python. O’Reilly Media; 1st Edition, 2012. Paperback.• Dwyer, Gareth. Flask by Example. Packt Publishing, 2016. Paperback.• Harris, Harlin, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An

Introspective Survey of Data Scientists and Their Work. O’Reilly Media; 1st Edition, 2013.

• Julian, David. Designing Machine Learning Systems with Python. Packt Publishing, 2016. Paperback.

• Kirk, Matthew. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly Media; 1st Edition, 2014. Paperback.

• Kumar, Ashish. Learning Predictive Analytics with Python. Packt Publishing, 2016. Paperback.

• McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media; 1st Edition, 2012. Paperback.

• Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media; 1st Edition, 2015. Paperback.

• Raschka, Sebastian. Python Machine Learning. Packt Publishing, 2015. Paperback.• Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0

Applications. O’Reilly Media, 2007. Paperback.