Lets eat presentation_final_20160521

Let’s Eat!Brad Binder, Lesley Chapman,Jon Froiland, David Lee

Introduction

History:

Since 1979 there have been services that review

and rank restaurants (Zagat)

•

Today:

According to Nielson – Americans have on

average 41 apps on their smartphones, many of

which provide a recommendation service

Introduction

A variety of restaurant recommendation apps

have been created

Features include: find restaurants, make reservations,

and healthy options

–

A Restaurant Recommender would aim to help

users save money, time, and could help cure

buyers remorse

Problem Summary

We need a tool that resolves the challenge of

finding a restaurant in your area based upon

specific cuisine and menu item criteria

entered by the user

Hypothesis

Hypothesis: The Restaurant Recommender will recommend a

more accurate restaurant compared to selecting a restaurant

based on chance alone

Ho (null hypothesis): A user will find a restaurant that they like

based on chance alone

HA(alternative hypothesis): The restaurant recommender app

will provide a better restaurant suggestion to the user compared

chance alone

Data Ingestion

• WORM Storage–Stored HTML menu pages in one location which could be read many times

• Parsed HTML with BeautifulSoup

–Built out a list of “Restaurant” objects

• GET requests to WMATA API to pull metro station data

–JSON data parsed with pandas read_json() function

Ingestion Wrangling Analysis Modeling Visualization

Wrangling and Munging

• Majority of time spent wrangling the data and building restaurants–Removing duplicate and incomplete records–Standardizing inconsistent fields (e.g. price)–Aggregating and grouping–Data types

• Merged restaurant and WMATA data using Euclidean distance


Data Overview


964 Total Restaurants115,517 Total Menu Items

• Restaurant data includes:–Name–Location (address, latitude, longitude)–Type of cuisine–Menu (item, price, description)

• WMATA data includes:

–Station name

–Location (latitude, longitude)

–Metro Line

Analysis


10 cities964 Restaurants

115,517 Menu Items

Analysis


964 Restaurants115,517 Menu Items

Washington, D.C.


Feature Selection

• Four feature extraction pipelines using sklearn–Chunking–Cuisine Type

• TfidfVectorizer

–Extract keywords and assign significance score

– Tokenize and chunk parts of speech using nltk

• LabelBinarizer

–Convert cuisine types to binary features

• FeatureUnion


Modeling and Prediction

• Transformation pipelines and transformed feature vectors pickled

• Kmeans models fitted using training restaurant data, then pickled

• User inputs entered via Flask are stored as training instance

• Relevant pipeline and model loaded to transform and predict


K=15



Reporting and Visualization

• Restaurant recommendations are determined by similarity within a matched cluster–“Similarity” is calculated by minimizing sklearn’spairwise euclidean distance function between the test data and the training instances in the feature space

• Predictions are exported into an interactive Tableau visualization

–Allows the user flexibility in making a selection through filtering and visual indicators

Results

• Some predictions are good, others not so good–Some clusters still contain a “hodge podge”

• Removing the “cuisine type” feature helped to eliminate what we saw as overfit

• Different k values saw better results in some cases, worse in others

• Additional features (price, ratings, metro) would require more clusters and MORE DATA

Conclusions

• More data over a “better” model• Might improve results using transformations

like Singular Value Decomposition (SVD) or Latent Dirichlet Allocation (LDA)– Better model analysis

• With more data, improve our tokenizer– Incorporate stemming, improve chunking

• Incorporating user feedback into prediction model (ex: Flask interface)

Additional Opportunities

• “Waiter-caller” function that would allow users to login, use the restaurant map search function, click on a restaurant, and be matched up with menu items based on keyword matches. As opposed to reading through an entire menu to find relevant items.

–Required more knowledge and implementation of javascript, css, and jinja into the Flask environment.

• Sentiment analyzer was developed but not integrated. Would allow users to go to restaurant and input a review. The review would then be analyzed giving back a recommended score (1-5) to the user.

–Similar requirements

Sources• Downey, Allen B. Think Bayes. O’Reilly Media; 1st Edition. 2013. Paperback.• Downey, Allen B. Think Python. O’Reilly Media; 1st Edition, 2012. Paperback.• Dwyer, Gareth. Flask by Example. Packt Publishing, 2016. Paperback.• Harris, Harlin, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An

Introspective Survey of Data Scientists and Their Work. O’Reilly Media; 1st Edition, 2013.

• Julian, David. Designing Machine Learning Systems with Python. Packt Publishing, 2016. Paperback.

• Kirk, Matthew. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly Media; 1st Edition, 2014. Paperback.

• Kumar, Ashish. Learning Predictive Analytics with Python. Packt Publishing, 2016. Paperback.

• McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media; 1st Edition, 2012. Paperback.

• Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media; 1st Edition, 2015. Paperback.

• Raschka, Sebastian. Python Machine Learning. Packt Publishing, 2015. Paperback.• Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0

Applications. O’Reilly Media, 2007. Paperback.

Lets eat presentation_final_20160521

Food

Transcript of Lets eat presentation_final_20160521