YELP DATA CHALLENGE - WordPress.com · For the business problem discussed above, we decided to...
Transcript of YELP DATA CHALLENGE - WordPress.com · For the business problem discussed above, we decided to...
YELP DATA CHALLENGE
Team Super 5
Madlen Ivanova
Kartik Niyogi
Saritha Ramkumar
Sampa Sanyal
Sugandha Mann
PAGE 1
CONTENTS
INTRODUCTION ................................................................................................................................... 2
RESEARCH PROBLEM ......................................................................................................................... 5
DATA ..................................................................................................................................................... 6
ANALYSIS ............................................................................................................................................. 8
I. ATTRIBUTES IMPACTING CONSUMER PREFERENCE ....................................................... 8
II. IDENTIFYING RELEVANT ATTRIBUTES THROUGH VOICE OF THE CONSUMER ......... 12
III. CUSTOMER SENGMENTATION BASED ON REVIEW TEXT .............................................. 18
BUSINESS STRATEGY RECOMMENDATION: ................................................................................ 22
CHALLENGES .................................................................................................................................... 26
CONCLUSION ..................................................................................................................................... 26
REFERENCES...................................................................................................................................... 27
APPENDIX ........................................................................................................................................... 28
PAGE 2
INTRODUCTION
According to SAS, Big data is a term that describes the large volume of data – both structured
and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of
data that’s important. It’s what organizations do with the data that matters. Big data can be
analyzed for insights that lead to better decisions and strategic business moves.
Big data is relevant in all industries in this age and day. The amount of data that is created and
stored today is incredible, and it just keeps increasing. That means there’s even more potential to
collect key insights from business information.
Impact of Big data on the restaurant business has grown manifold just like in other sectors like
banking, retail and pharmaceutical.
Food Innovations:
These days the trend has been moving towards focusing of analytical practices for opening new
Restaurants. Several themes based restaurants have successfully been running, for example
Amelie’s bakery is theme based French café, which opened in downtown Charlotte after a
successful franchise in Noda.
Also, different food innovations are setting trends in the ultra-competitive food industry or
restaurant industry these days. For example, on one hand you have food Innovations like Cronut,
which is in vogue these days which is a hybrid of a croissant and a donut.
PAGE 3
Whereas on the other hand food is also getting cleaner, leaner and healthier, like Panera Bread
introduced 100% clean menu recently, all organic and locally harvested food.
From farm to table sourcing food trend has also been brewing up for quite some time in
Charlotte and nearby places. Restaurants have their own terrace gardens or farms from where
they source their raw materials.
Also, meal or menu specific raw materials delivery by companies like Blue Apron, Hello fresh is
in vogue these days, these companies would deliver all ingredients that you need to create
amazing meal.
The impact of food innovations has been tremendous on restaurants, resulting in new theme
based restaurants, multi cuisine restaurants. Some of the fastest growing chains in the restaurant
industry are the ones embracing innovation throughout their operations. Restaurants these days
are thriving on how limited-service chains can leverage innovation in various forms.
Subsequently, innovations are being experimented upon from customer experience to restaurant
ambience, kitchen innovation, menu innovation, etc. Also, innovation is being used as a catalyst
for franchise expansion. Although the mobile apps and other services offered by Yelp, Groupon,
ChowNow etc. happen to be the lifeline of restaurants, unravelling useful information from
review texts offered businesses an innovative avenue to understand customer feedback and
opportunities. They help restaurants in taking orders online making customer ordering
experience hassle free. They also provide customer reviews for a restaurant and try capturing the
whole experience of dining from Parking to Pets.
Charlotte Restaurant Market:
Charlotte is among the “top large markets” with 3+ million residents.
PAGE 4
Table 1: North Carolina Restaurant Sales
According to the Charlotte Chamber of Commerce, there are more than 1,500 restaurants and bars
in Mecklenburg County. In the combined Charlotte, Concord, and Gastonia region, according to
the Bureau of Labor Statistics, that number grows to 4,382.
According to the National Restaurant Association, the number of restaurant and food service jobs
in North Carolina is expected to grow another 15% during the next 10 years.
Figure 1: North Carolina Restaurant Industry
PAGE 5
The United States Commerce Department reported in 2015 that Americans now spend more
money at restaurants than they do at grocery stores. The average American family eats out 4.5
times a week.
For our study, we are focusing on dataset by Yelp as part of Yelp dataset challenge 2017
Our work focuses on gaining in-depth understanding of consumer’s heterogeneous preferences
toward various restaurant attributes by analyzing consumer’s restaurant reviews on Yelp.com. We
identify key attributes to particular cuisines (e.g., Italian, Japanese) at a specified area (e.g.,
residential, uptown) to help restaurant managers understand the Charlotte restaurant business, and
thereby guide investors when considering investment opportunities.
RESEARCH PROBLEM
Our goal is to create and provide a reference guide to any future investor, so they can make
informed decisions when opening new restaurants or trying to improve one. To do this, we attempt
to understand how cuisine preferences and locality influenced the success of restaurants in and
around the Charlotte area. Charlotte, being the third-fastest growing major city in the United States,
restaurants here offer a plethora of options in terms of ambience, parking space, price range, drive
through, delivery, pet friendliness etc. However, the customers, on the other hand, are also keen
on a different set of attributes such as friendliness of staff, wait time etc. along with food taste and
quality. The weightage of these preferences varied totally based on both locality and cuisine type.
Hence, we found it important to analyze business characteristics and customer preferences
separately to get the complete picture.
To achieve this, the analysis proceeded with three main objectives,
PAGE 6
• Identifying important business characteristics and trends of the Charlotte restaurant
business industry
• Understanding customers’ overall preferences for restaurants
• Identifying multiple customer segments in Charlotte and compare & contrast customers’
differential preferences among Charlotte areas and cuisine type
DATA
For the business problem discussed above, we decided to integrate the freely available Yelp
dataset from Yelp Dataset challenge 2017 with the Charlotte city Zoning data.
The Yelp dataset consisted details of all business from 11 cities across 4 countries. The
primary task in the data preparation was to convert the data in json format into csv files using
python. Then the data was sub-set to include Charlotte specific restaurants business. The final list
comprised of the details of 2,140 restaurant business with a total of 121K review texts. Data
preparation and merging were carried out in MySQL and Excel. The below table summarizes the
key aspects of the datasets.
Data Files Attributes
Users User Information: User Name, Yelping Since
Reviews: Number of reviews, Average Star rating,
Votes: Helpful votes Provided and Received (Cool, cute, funny, hot)
Business Business Information: Name, Address, Latitude, Longitude
Attributes: Key features like Ambience, Parking, Alcohol, Outdoor seating etc.
Categories: Cuisine type– American, Italian, European, Latin, Chinese, Indian
Reviews Key Identifiers: Business ID and User ID
Review Information: Review Text, Date
Review Usefulness: Star Rating, Votes (Useful, Funny, Cool)
PAGE 7
The second dataset is the city of Charlotte zoning data, which is coded in geospatial shape files
(.shp) with each zone marked by area and shape. The latitude and longitude data from the business
file in Yelp is linked to the area coordinates to mark the zone of each restaurant in the business
file. The geospatial merging discussed here we carried out in R programming. The Charlotte
County had 23 unique zones. For the sake of ease of analysis, the 17 city zones were rolled up into
5 zones - Uptown/Office, Residential, Industry/ Business, Institution/Research and others based
on the similarity as shown in the below table.
CHARLOTTE CITY ZONES ROLLED UP ZONE
Business
BUSINESS/INDUSTRY
Heavy Industrial
Light Industrial
Commercial Center
Business-Distribution
Multi-Family
RESIDENTIAL Single Family
Manufactured Home
Urban Residential
Uptown Mixed Use
UPTOWN/OFFICE Business Park
Office
Mixed Use Residential
Institutional RESEARCH
Research
Mixed Use OTHERS
Transit Oriented
Table 2: Consolidated Zones
PAGE 8
ANALYSIS
Exploratory Data Analysis:
Figure 2: (Left) Review count by Cuisine. (Right) Box-plot of Star Rating by cuisine
As can be seen from the Fig.2 (Left) that only few restaurants account for bulk of the reviews while
most other restaurants have only very few reviews. The Fig.2 (Right) shows the median star rating
across cuisine is approximately same.
I. ATTRIBUTES IMPACTING CONSUMER PREFERENCE
The focus is on understanding the relative importance of certain key attribute (primary) along
with utility value of the respective sub-attribute based on consumer preference. Secondary
attributes are analyzed to understand the impact on Star ratings.
The Business dataset contained “Attribute” column which provided details of 29 attributes that
described the restaurants. Five primary attributes are shortlisted to perform conjoint analysis. The
remaining 24 secondary attributes are used in preparing Regression model in analyzing the impact
of these attributes on Star rating.
Analyzing primary attributes:
PAGE 9
The five primary attributes along with their respective sub-attributes are as below:
• Alcohol – {Wine & Beer, Full Bar, None}
• Ambience – {Causal, Classy, Divey, Hipster, Intimate, Romantic, Touristy, Trendy, Upscale,
None}
• Parking – {Garage, Lots, Street, Valet, Validated, None}
• Good For Meals – {Breakfast, Brunch, Lunch, Dinner, Dessert, Latenight, None}
• Price – {1,2,3,4}
The results of the Conjoint Analysis are as shown in Table 3 below:
Table 3: Conjoint Analysis: Key attributes vs Cuisine/ Zone
The table represents the attribute mapped against Cuisine/ Zone based on importance. The
Blue color represents the attribute (and sub-attribute) that was most prominently associated with
particular Cuisine/ Zone while the orange represent the second significant attribute (sub-attribute).
E.g. Customers visiting American cuisine restaurant values “Hipster” ambience and “validated”
parking as the most important or primary significant attributes while European cuisine customer
values “Divey” ambience as the primary attribute followed by preference for alcohol availability
(specifically “Wine & Beer”) as the secondary significant attribute. Similarly, for Zone, customers
American Asian European
Mexican/
Latin Others#
Business/
Industry
Institution/
Research Others* Residential
Uptown/
Office
Alcohol Wine_Beer Wine_Beer Wine_Beer Wine_Beer Wine_Beer Wine_Beer Full_Bar Full_Bar Wine_Beer Wine_Beer
Ambience Hipster Classy Divey Hipster Classy Classy Classy Romantic Hipster Hipster
Parking Validated Validated Validated Validated Street Validated Street Lots Validated Street
GFM Dinner Lunch/
Dinner
Brunch Dinner Dessert Dessert/
Lunch
Lunch/
Latenight
Dessert/
Lunch
Brunch Dessert
Price
(1 - 4)
4 3/ 4 3 4 2/ 3 3/ 4 1/ 2 1 3/ 4 4
Primary Significant # - Sandwich & Juice bars, Pizza, Burger joints
Secondary Significant * - Transit Oriented
All others
Cuisine Zone
PAGE 10
dining in Uptown/ Office area prefer “Hipster” ambience as primary, while “expensively” Priced
restaurant as secondary attribute.
The Conjoint table helps us map the preference of consumers across cuisine and zone. Based on
this we can derive opportunity zone for each cuisine as shown in Fig.3 below:
Figure 3: Opportunity Matrix between Cuisine and Zone
A visual representation on Charlotte map is as below in Fig. 4
Figure 4: Charlotte Map: Zone vs Cuisine
Analyzing secondary attributes:
The impact of remaining 24 secondary attributes on star rating are as show in Table 4
below:
PAGE 11
Table 4: Regression: Star Rating vs secondary variables
A possible explanation for this observation is that high-end restaurants generally garner higher star
ratings compared to daily eat-out restaurants. While most daily eat-out restaurants provide drive
thru facility the high-end restaurants do not provide this facility. This can be seen from the table
below:
Drive
Thru?
Avg.
Star
Rating
Restaurants
True 2.6 McDonald’s, Wendy’s, Burger King, Chick-fil-A etc.
False 3.4 Amelie’s French Bakery, Mojo's Famous, Noodles & Company
etc.
For daily eat-outs, generally customer write reviews when they are unhappy e.g. cold pizza,
mixed up order, long wait time etc. So, the low average rating for daily eat-outs are not on account
of facilities like Drive Thru or Delivery service but possibly on other larger service issues.
Dependent Variable Stars
Obs 2140
R-Square 0.16
Root MSE 0.71
Significant Variable Estimate
Bike Parking 0.31
Caters 0.14
Drive Thru -0.77
Delivery -0.09
Wheel Chair 0.13
Happy Hours -0.31
Credit Card Take Out
Good For Kids WiFi
Has TV BitCoin
Noise Level BYOB
Outdoor Seating Dogs Allowed
Attire Music
Good For Groups Best Night
Reservation Diet Restrictions
Table Service
Non-Significant Variables
Regression Output
The regression output is characterized by following:
• Low R2 of 0.16 indicates lack of explaining
power of the model
• Only 6 of 24 variables significant
• Parameter Estimates not in-line with general
expectations:
o Drive Thru, Delivery and Happy Hours
are negative
o Would expect higher Star rating for
improvement in above factors
PAGE 12
II. IDENTIFYING RELEVANT ATTRIBUTES THROUGH VOICE OF THE CONSUMER
The objective of this exercise was to use text categorization as a tool to identify key user topics
and attributes within the restaurant category. The approach taken was to identify key attributes
not already a part of the analysis in Part 1 and Part 2 but extremely critical to the industry. The
key user topics that was finalized were the following:-
• Customer Experience (Service, Food, Ambience, décor, wait, order, staff, extra amenities
like bike parking, BYOB, etc. were key attributes that were included as part of the
“Customer Experience” topic
• Taste
• Wait Time
• Go with
• Entertainment (Music, T.V etc.)
Approach Taken for Text Categorization
• Restaurant Ontology was generated after going through the entire review corpus. A sample
list is attached in the Appendix Table 18
• Standard English Stop words were used in addition to customized stop words
• Text Categorization was done across reviews corpus and across specific cuisines
• This was done in an iterative fashion to remove terms of non-interest and highlight those
which would drive insights
PAGE 13
Table 6: No of occurrences across the American Cuisine subcategory
As can be seen from Table 5 Customer Experience and Taste remain the top two key attributes
across the entire corpus and across specific cuisine types. Another view of the leading term
frequencies are as follows:
Figure 5: Term Frequency
Food 31%
Place26%
Service17%
time16%
menu10%
Leading Term Frequencies across entire corpuses
Food Place Service time menu
Table 5: No of occurrences across the entire corpus
PAGE 14
The above chart indeed validates that based on term Frequencies – Food and Service together
drive customer reviews and preferences within industry. Analyzing the key terms that form the
leading key user topics-the following are the output.
CUSTOMER SENGMENTATION BASED ON REVIEW TEXT
Customer Experience
Entertainment
Go With
Taste
Table 7: User Topics for various attributes
PAGE 15
As we can see from Table 7, customer reviews were classified across 4 user topics with
predominance of Customer Experience, and Taste as the leading “user topics”
Highest frequency terms in each topic is also displayed with their frequency count across cuisines
and the leading terms which drive each of them. For e.g. under Customer Service we can see terms
like place, service, time which helps us draw insights on features/attributes or parameters which
are important to the “Consumer Experience Category”.
In a similar way – key topics under the category “Entertainment” can be seen in Table 7. Key terms
like “noise”, “television”, “music” are leading terms within this category which are indicative of
the type of “Entertainment” preferences of the consumer.
Text Categorization was also used effectively to mine for key terms or data across “Highly Rated”
and “Lower Rated” restaurants. For the above, corpus was categorized taking restaurants with 4
Star and 5 Star ratings as high while the ones that were less than 4 as “Low.”
The output is as follows: -
Table 8: Topics for highly rated restaurants
PAGE 16
From the above we can summarize the following: -
• Machine generated topics are more prominent over User provided topics
• Drivers for good ratings can be clearly seen across the categories of Food, Service and
Price. This has been highlighted in the above Table
• Term frequencies also seem to indicate the same result
Figure 6: Concept Link
From the above we can see a strong term association between “food”, “good food “and “service”
and “great service”. These terms are indicative of key attributes that drive higher ratings across
restaurant category.
Table 9: Topics Driving Negative Restaurant Reviews
PAGE 17
From the Table 9 above, we can see that negative ratings are driven mainly by Service, Amenities
within the restaurant and not specifically food. Exploring this concept further via concept links,
we can see that there is a strong correlation between negative terms like “horrible”, “bad” and
“slow”. The topics have been highlighted in the above table for clarity.
We also find that the concept link of “Food” does not really have any negative terms strongly
associated with it. This highlights that “Service” remains one of the key attributes which drive
negative reviews within this industry.
Comparison of Leading Key Terms Across Different Cuisine Categories:
The objective of this was to understand through text categorization the differences in key terms
across different cuisine types to help us gather insights across the differences, similarities, and
characteristics within the cuisine category.
Table 10: Key Terms across Cuisine
From the above table, we can gather the key terms that characterize a cuisine type and hence drive
the consumer preferences within that particular segment. For e.g. the American cuisine highlights
“Specials” in terms of Food, indicating the customer preference within this category for “specials”
- special offers, special combos, etc. The Place term is highlighted with sub terms like “vibes”,
American Asian European Mexican Others
Food Specials,order,server Chinese, Japanese,Thai
,delicous food, expansive
menu , reasonable price
Service, Dress, Décor Authentic,Service,people,price
atmosphere
terrible, server,atmosphere ,
price, service
Place Vibes,music,specials Nice, clean, atmosphere, pack ,people, understand
park, family friendly
,atmosphere,date
patio,table,silverware vibe,cool,sit,server,music
Service Return, Sit, customer
service, server
table, staff, fast,
efficient, server
bad, excellent,slow,
staff
Server/staff,experience, locati
on,slow
order, server, arrive ,people,TV
Time server ,menu ,seat, hard
time
wait,server, whole time,
hard, order
arrive,order,wait time server,order,arrive explain,combo , serve, attitude
Menu special,tasty variety, desserts Specials,flavor,
crispy,perfect
texture,tasty
explain,combo, serve,
attitude
special,tasty,restaurant
Great/Good food,drink
specials,variety,
food ,drink,
offer,town,spot,customer
service
location, food,experience,
selection
vibe,outdoor,music,
food,specials,variety
food,staff,atmosphere,
PAGE 18
“music”, and “specials” throwing insights on customer preference on the ambience, place of the
restaurant.
Hence with this one stop view, we can with the help of text categorization assess the key
characteristics that drive each of these categories across key terms.
III. CUSTOMER SENGMENTATION BASED ON REVIEW TEXT
The objective of the analysis is to segment users into different groups based on the key aspects
he/she would consider in reviewing a restaurant. Different users give preference to different facets
while dining at a restaurant. More than just the taste and quality of food, people may consider the
hospitality of staff, wait time, etc. The analysis aims at looking at these features in the reviews by
clustering the reviews and then cross referencing them to each of the zones and cuisine types
separately. This would aid in arriving at a conclusion of how the customer preferences vary across
zones as well as the cuisine type. We would also cross reference the results from the analysis to
the review rating score provided by the user to assess the sentiment so that we can gauge the user
satisfaction about the food/ service/ wait time. Lastly, tabulating the number of open/closed
restaurants against the customer segments would help in understanding how the preference affects
the overall success of the restaurant.
A comprehensive stopword list to eliminate other attributes and keywords was pivotal in the
data processing. The list of stopwords used is given in the Table 11 below:
burrito finally Rice lunch fries sides restaurant Steak
crust mexican Beer Meal salad hotdog pasta noodle
fish pizzas Bar Meat wing hot soup Roll
pizza tacos bread menu wings dog sushi Thai
salsa toppings burger Pork fried chilli potato Broth
taco sauce chicken sandwich Queso drink mac chinese
beans cheese dinner Wine egg eat chip indian
PAGE 19
chips shrimp Dish Food coffee price $ vietnamese
ramen pho Charlotte Bean Tortilla Chipotle breakfast
Table 11: List of Stopwords
The analysis was carried out in SAS E-Miner Text Mining interface. SAS E-Miner offered two
types of text clustering mechanisms - Expectation Maximization and Hierarchical. As it goes with
any clustering mechanism, the primary challenge was to identify the ideal number of clusters;
various combinations were tried to see the most meaningful and logical representation of text.
After a couple of iterations, it was decided to go with three clusters; the key attributes of which
are summarized in the Table 12 below:
Table 12: Summarization of Key Attributes
As it is very evident in the above table, the three clusters were cohesive (indicated by the
low RMSTD) and equidistant in the vector space. Also, the proportion of review texts was almost
uniform across the clusters. The cluster labels based on the descriptive terms in the cluster
definition were - Time Bound, Foodies, and Service/Atmosphere Bound. These three clusters were
indicative of the three common customer segments in the restaurant market.
Analyzing the customer segments:
The next task in the analysis was to analyze the distribution of customer segments across
zones and cuisines. The pie chart distribution summarizes the same:
PAGE 20
Figure 7: Distribution by Zone
The bigger pie chart shows the number of reviews for the restaurants by zones and the
smaller pie charts shows the customer segment distribution in each zone identified by the matching
color. It is evident that the customers in uptown area are more inclined towards Service and
atmosphere where as the customers in business and industry zones are more bound towards the
taste and quality of food.
Figure 8: Distribution by Cuisine
PAGE 21
This chart is analogous to Fig.7 above, but shows the cuisine wise distribution instead of
the zones. It is clear that the customers talk a lot about the service and atmosphere of European
and Asian restaurants.
Once the zonal and cuisine wise distribution was understood, the analysis proceeded to
understand how customer likings vary by zones. The average ratings by the customer who provided
the review was used to understand the customer sentiments. The Table 13 below summarizes the
mean customer rating by zones and cuisines for each of the customer segment identified above:
Table 13: Mean Customer Rating by Zone and Cuisine
Lastly, the above segments by zone and cuisines were compared based on the percentage
of closed restaurants as shown in Fig.9 below, which helped in validating the Table 13 with respect
to the overall success of the restaurants.
PAGE 22
Figure 9: Percentage of Closed Restaurants by Zone/ Cuisine
BUSINESS STRATEGY RECOMMENDATION:
Case Study:
The case study was performed to apply the conclusions that we came to after performing our
analysis. We decided to apply the discovered customer preferences to 2 instances from our data (a
successful and an unsuccessful restaurant) to confirm our research. We chose the restaurants with
the highest number of reviews. More reviews, means more objective public opinion that would
better help us out in determining the reasons that contributed to the unsuccessful case (shutting
down the restaurant).
Table 14: “Pinky’s Westside Grill” vs “Nan and Byron’s”
Successful case: “Pinky’s Westside Grill”
The successful case was an American cuisine restaurant that is located in residential zone. It has
been in business for about 7 years now and has a rating of 4 stars as shown in Table 14 above.
Unsuccessful case: “Nan and Byron’s”
The unsuccessful case was an American cuisine restaurant that was located in uptown are. It closed
down after 3.5 years and had a rating of 3.5 stars at the time of closing.
Comparison:
Pinky’s Westside Grill Nan and Byron’s
PAGE 23
First, we wanted to consider the average popularity of the two restaurants in terms of people
reviewing them, so we can fairly compare them. We simply divided the number of reviews per the
number of years that this restaurant has been in business. As you can see both restaurants have
similar number of reviews per month and about similar rating. The successful case was still
leading, but the difference between both was not significant. This made our case study even more
intriguing. Having that the restaurants had similar popularity and rating makes everybody wonder
what might be the reasons for one of the restaurants to close and the other to stay in business.
In the following table, we have described what will be the profile of a successful American
cuisine restaurant that is opened in residential or uptown area:
*GFM: Good for Meals
Table 15: Profile of successful restaurant by cuisine and zone
From the conjoint analysis, we came to the conclusion that a successful restaurant that opens in a
residential area needs to offer at least wine and beer, has hipster ambience, validated parking,
and it should be good for dinner and brunch. Additionally, from our customer segmentation
analysis, we concluded that in the residential zone and for an American cuisine restaurant, people
value food and waiting time the most.
On the other hand, we determined that a successful American cuisine restaurant that opens in
uptown area should provide at least wine and beer, have hipster ambience, and offer validated or
PAGE 24
street parking. It should be good primarily for dinner while dessert is also an important dining
aspect. From our customer segmentation analysis, we concluded that in the uptown area and for
an American cuisine restaurant, people value service and waiting time the most.
Next, we compared the characteristics of the two restaurants with the conclusions from our
research. We used the reviews of the two restaurants we chose, to find out what might be the
reason for the “Nan and Byron” to close doors. After comparing the characteristic of a successful
restaurant to the profiles of the two restaurants, we observe the following:
*GFM: Good for Meals
Table 16: Comparing Successful vs Unsuccessful Restaurants
Successful case:
“Pinky’s Westside Grill” satisfies the customer needs for choice of drinks, and preferred
ambience (highlighted in Green). It is good for lunch and dinner and according to the reviews, it
has very good food and short waiting time. However, it did not have “validated parking”, but it
offered a parking lot, which was acceptable.
Unsuccessful case:
“Nan and Byron” satisfied the customer needs for choice of drinks. According to our analysis,
the people preferred validated or street parking, so we found that having a parking lot is
PAGE 25
acceptable. However, the restaurant did not match the customer preferences for ambience. As per
the conjoint analysis, “Nan and Byron” seemed to be more preferred for brunch, however, the
customers in the uptown location valued dinner and dessert. Moreover, people in this zone
valued the service the most, but according to the reviews, this restaurant didn’t have the best
service. It had good food, but that was not enough to keep its customers satisfied, and this could
have been the possible reason for it to shut down. In essence, if this restaurant better understood
its customers segment and took remedial action to match the customer preference, it might have
stayed in business longer.
Key Recommendations:
Irrespective of the features offered at a restaurant, customers look for majorly three factors - time
bound service, hospitality of staff and quality of food. But, the preference among these attributes
may vary highly among the locality and cuisine type. Hence, in order to cater to these demands,
even different franchises of the same chain may have to tailor these attributes based on zones
and/or cuisine type. For example, the customers of uptown restaurants are more concerned about
the wait time than about the food or staff hospitality, whereas, customers who reviewed about
restaurants in business/industrial area are more inclined towards the food taste and quality than
about waiting time or the services offered. Similarly, if an investor is looking to invest in an
American cuisine restaurant, then ideal value proposition to the customers would be “Hipster”
ambience along with “validated” parking. The consumers would value specials along with variety
in menu options. Timely service of the order is also a critical aspect. On the other hand, if investor
is choosing Institutional/ Research location then the ideal value proposition to the customers would
be a relative less expensive restaurant with “Classy” ambience. An improved food quality while
maintaining the price would help gain better market share
PAGE 26
CHALLENGES
The biggest challenge during the data collection phase of the project was to handle the large dataset
of more than 4 GB with more than one million rows in json format. The data conversion was done
using python and the data was exported in MySQL database for merging and filtering. Later,
cleaning the text data such as removing non English and special characters was tedious task;
however the inbuilt features in SAS Miner simplified the process.
As we proceeded with the detailed analysis of text, dealing with sarcasm in the reviews and ability
to differentiate genuine vs made up reviews imposed a greater challenge. Preparing restaurant
specific taxonomy and stopwords facilitated in easing this process. Lastly, in customer
segmentation, text clustering had to be executed in multiple rounds to arrive at meaningful and
logical clusters.
Our recommendations are primarily for the Charlotte area. As such this outcome might not be fully
applicable to other regions within the US.
We also have not taken into account the other factors that impact restaurant business like restaurant
model – franchise/ single-owner, operational cost, profit margin, employee skills, managerial
acumen, etc. as it was beyond the scope of our project.
CONCLUSION
Charlotte is among the fastest growing cities with rapid expansion in both business and population.
We found it important to understand how cuisine preferences and locality influenced the success
of restaurants in Charlotte and came up with a reference guide to future investors, so they can make
an educated decision when executing their projects.
PAGE 27
To reflect our first objective, we focused on identifying key attributes of the Charlotte restaurant
business industry. The primary attributes that we identified are specific to Charlotte region. The
importance of these attributes might vary for different region.
With regards to our second objective of understanding customers’ overall preferences for
restaurants we identified term matrix detailing key terms across different cuisines.
Our third objective was to identify multiple customer segments in Charlotte and contrast
customers’ differential preferences among Charlotte area. Based on our customer segmentation
analysis, we found three primary customer segments - Time Bound, Service/Atmosphere Bound,
and Foodie customers. We concluded that the higher customer satisfaction is driven by Food and
Service, while lower satisfaction levels are primarily accounted by poor Service/Ambience.
REFERENCES:
Primary dataset: https://www.yelp.com/dataset_challenge
SAS, Big Data – What is it and Why it Matters!
https://www.sas.com/en_us/insights/big-data/what-is-big-data.html#
City of Charlotte Zoning Data:
http://cltcharlotte.opendata.arcgis.com/datasets/17a4cbd948934fae8a63139a8e371000_8
National Restaurant Association, “Big Data and Restaurants: Something to Chew On””
Matt Wolff, March 2011, “The Best 10”, Restaurant Growth Index
PAGE 28
APPENDIX:
Table 17: Working for Conjoint Analysis - Cuisine
Table 18: Sample Ontology for Topic Modeling
PAGE 29