Data Science for the Hospitality Domain
Dr. Nicolas NicolovSr. Director, Head of Data Science,OpenTable, Inc.1 Montgomery Str.San Francisco, CA 94104, U.S.A.
2
OpenTable
• Seated over 1B diners since 1999; $45B spent at partner restaurants.• 20M diners / month.• 42M reviews created since 2008 (650K reviews/month).• 600 partners: Google, TripAdvisor, Bing, Yahoo, Zagat, Eater …
Part of the Priceline group:
3
US
24,194 reservable / 66,109 total
4
UK5,389 reservable6,832 total
5
37, 861 reservable87, 328 totalWorld-wide
6
World-wide1. Italian2. Seafood3. American4. Steak5. Japanese
Restaurant CuisinesTop 5 Cities & Globally
New York City1. Italian2. American3. Japanese4. Seafood5. French
London1. Italian2. Japanese3. Indian4. Steak5. Asian
San Francisco1. Italian2. Seafood3. American4. Steak5. Japanese
Chicago 1. Italian2. American3. Steak4. Seafood5. Steakhouse
Washington DC1. American2. Italian3. Seafood4. Contemporary Am.5. Steak
7
Mobile First
More than 50% of reservations on mobile.
Discovery tab / Collections:• iOS launched: June 2016.• Android launched: Nov 7, 2016.
Collections:
9
Data Science at OpenTable
• Autocomplete.• Search (indexing, ranking).• Recommendations.• Inventory Optimization.• Advertising /
Promoted Inventory.• Content analysis.
• Autocomplete. • Tagging.• Cuisine / menu analysis.• Search (all platforms).• Similarity: User-user / Restaurant-restaurant.• Recommendations (Web, Collections, Emails; Explanations).• Inventory optimization (cover/demand prediction, simulation,
tracking lift).• Sentiment: review analysis.• Review selection. • SEO: Points Of Interest (POIs).• Wait time prediction.• Turn time prediction.
Areas Projects
Search
Autocomplete: Location
11
names/cuisines/tags
SearchRetrievalFacetsRankingTags
12
RankedSearchResults
Facets ~ Search keywords
Dishes
13
Frequent queriesiPhone iPad
14
Time to book
People have to sleep at some point
20 days in advance
15
Hierarchical Cuisines
16
17
18
Machine Learning Ranking
Recommendations
Personalized Restaurant Ranking
20
Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%
. . .
Bob 95% 91% 87% 85% 80% 78% 71% 69% 61% 57% 12% 10%
. . .
Topic ModelsFingerprints for restaurants − from our diners’ perspective
21Italia
n fo
od
pizz
a
win
e
wai
ters
expe
nsiv
e
22
Ingredients of a Recommendation Engine
Personalized subgroups (lists/rows)
23
Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%
. . .
Alice
24
Personalized Emails
25
Mobile Recommendations
26
Inventory Optimization
27
Busy restaurants
We can help optimize their
schedule.
28
Seat Most Diners Every Day
The average restaurant has tables empty between turns such that they could accommodate an additional 4,580 diners per year. At $45/guest (avg. cost per meal) that’s $200k.
But squeezing the most out of every seat seems impossible…
29
This reservation prevents an earlier one
If the ‘turn time’ for a party of 2 at the restaurant is 2hrs the ‘Bad Reso’ starting at 7:45pm prevents a reservation @6pm. If we could have only asked the user to shift their reservation by a mere 15mins (to start at 8pm) this would have opened an entire new turn on the table (starting at 6pm).
X
Turn Time
30
Keep the Table Busy the Whole Night
With the later reservation shifted over by only 15 min, now there is space for an earlier turn.
Possible Reservation
31
System Prevents Costly Reservations
If we think diner can book 7:15pm, we will restrict the times that prevent the diner from getting an early turn:
Accepting a 7:45pm reservation will prevent an extra turn on that table.
No Insight into Impact of Accepting a Reso
32
Restaurant staff knows impact of reservation.
Tetris Shows which Resos Cost a Turn
33
34
Simulator No Restrictions Winning Policy
(2 turns) (3 turns)
Techniques for Cover Prediction
Cover Predictionwhat ? why? how ?
36
• Predict future covers of a specific restaurant. • The predicted covers used in calculating lift.• Predictions: Time series and ML models.
Time
Cove
rs
Past Future
PredictionsReal 𝑥0 , 𝑥1 , 𝑥2 ,⋯ ,𝑥𝑛
𝐹 𝑛+1 ,𝐹𝑛+2 ,⋯ ,𝐹 𝑛+𝑘
𝑥0𝑥1
𝑥𝑛
𝐹 𝑛+1𝐹 𝑛+2𝑥2
37
Lift
Lift = Average percentage difference between the observed and predicted covers.
Time
Cove
rs
Past Future
Real w/ new system
Predictions for old systemReal (train)
How the new system did.
How the old system would have done.
Average
38
• Predictions = Average of all existing covers.
1
Past
Time
Cove
rs
Future
PredictionsReal
𝐹 𝑛+ 𝑗=1
𝑛+1∑𝑖=0𝑛
𝑥𝑖𝑗∈1,2 ,…
Moving Average
39
• Predictions: Average of previous k values.• Sliding window: older data points not used.
2
Future
Time
Cove
rs
Past
Predictions
Real
𝑥𝑡′= 1𝑘∑𝑖=0
𝑘−1
𝑥𝑡− 𝑖 𝑡∈𝑘−1 ,𝑘 ,…
𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘Forecast
Smoothing
Exponential Average
40
• Predictions = Combine existing covers by giving exponentially lower weights to older covers.
• Importance given to recent vs. older covers controlled by .
3
Time
Cove
rs
Past Future
PredictionsReal
𝑥𝑖′=𝛼 ∙ 𝑥𝑖+(1−𝛼) ∙𝑥 𝑖− 1′ 𝑖∈1 ,…,𝑛
𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘
𝑥0′ =𝑥0 Initialization
Forecast
Smoothing
(Robert Brown, Charles Holt)
41
Example: Exponential Average𝑥0′ =𝑥0
𝑥1′ =𝛼 ∙𝑥1+(1−𝛼 ) ∙𝑥0𝑥2′ =𝛼 ∙𝑥2+(1−𝛼 ) ∙𝛼 ∙𝑥1+(1−𝛼 )2 ∙𝑥0
𝑥3′ =𝛼 ∙ 𝑥3+ (1−𝛼 ) ∙𝛼 ∙ 𝑥2+ (1−𝛼 )2 ∙𝛼 ∙ 𝑥1+(1−𝛼)3 ∙𝑥0
𝑥𝑛′ =𝛼 ∙ 𝑥𝑛+(1−𝛼 )1 ∙𝛼 ∙ 𝑥𝑛−1+(1−𝛼 )2 ∙𝛼 ∙𝑥𝑛−2+⋯+ (1−𝛼 )𝑛− 2 ∙𝛼 ∙𝑥1+(1−𝛼)𝑛−1 ∙ 𝑥0
⋮
Holt Winters 2D
42
• Take into account previous value and the trend.• Trend is the slope between current and previous points. • and control weight given to current point and trend.
4
Time
Cove
rs
Past Future
Predictions
Real Level
Trend
Level
𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend
𝑥𝑖′=𝑙𝑖+𝑏𝑖
Forecast
double exponential smoothing
𝑙𝑖=𝛼 ∙ 𝑥𝑖+(1−𝛼 ) ∙ (𝑙𝑖− 1+𝑏𝑖− 1)
𝑙1=𝑥1 ; 𝑏1=𝑥1−𝑥0 Initialization
𝐹 𝑛+ 𝑗=𝑥𝑛′ + 𝑗 ∙𝑏𝑛
Smoothing
𝑖∈2 ,…,𝑛
𝑖∈2 ,…,𝑛
43
Example: Holt Winters 2D
𝑙2=𝛼 ∙ 𝑥2+(1−𝛼 ) ∙ (𝑙1+𝑏1 )=𝛼 ∙𝑥2+(1−𝛼 ) ∙ (𝑥1+ (𝑥1−𝑥0 ))=…
𝑙1=𝑥1𝑏1=𝑥1−𝑥0
𝑥2′ =𝑙2+𝑏2
𝑏2=𝛽 ∙ ( 𝑙2− 𝑙1 )+ (1− 𝛽) ∙𝑏1=…
(initialization)
Holt Winters 3D
44
• Predictions = Take into account previous value, trend in covers and seasonality.
• Trend is the slope between current and the previous point. • Seasonality takes into account the average of every kth point in the
season, in our case season is 7 points or 1 week.• , and control weight given to current point, trend and seasonality.
5
Time
Cove
rs
Past Future
PredictionsReal Level
TrendSeasonal
𝑙𝑖=𝛼 ∙ (𝑥 𝑖−𝑠𝑖−𝐿 )+(1−𝛼 ) ∙ (𝑙 𝑖−1+𝑏𝑖− 1 ) Level
𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend
𝑠𝑖=𝛾 ∙ (𝑥𝑖− 𝑙𝑖 )+ (1−𝛾 ) ∙𝑠𝑖−𝐿 Seasonality
triple exponential smoothing (with additive seasonality)
𝐹 𝑛+ 𝑗=𝑙𝑛 + 𝑗 .𝑏𝑛+𝑠𝑛−𝐿+1+ ( 𝑗 −1)𝑚𝑜𝑑 𝐿 Forecast(Peter Winters)
Holt Winters 3D with Seasonality
45
L = season length = 1 week
5
Time
Cove
rs
Future
1 week
Past
Algorithm
ObservedSeasonal
𝑏0=1𝐿 ( 𝑥𝐿+1−𝑥1
𝐿 +𝑥𝐿+2−𝑥2
𝐿 +…+𝑥 𝐿+𝐿−𝑥 𝐿
𝐿 )
𝑙0=𝑥0
Calculating Hyper Parameters
46
5• Minimizing objective function: Root
Mean Square Error (RMSE); depends on .
• Nelder-Mead heuristic search method.
• Simplex is a polytope of n + 1 vertices in n dimensions.
• At each step we do: reflection, expansion, contraction or shrinkage.(John A. Nelder & Roger Mead)
47
Nelder – Mead: Reflection
Reflection:
𝑓 (𝐱 )Objective function:
Initial test points:
𝐱∈ℝ𝒏
𝐱𝟏 ,…,𝐱𝒏+𝟏∈ℝ𝒏
Sort: 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )
𝐱𝟏
𝐱𝟐
𝐱𝒏+𝟏
𝐱𝟎
𝐱𝒓← 𝐱𝟎+𝜶 ∙ (𝐱𝟎−𝐱𝒏+𝟏 )
𝐱𝒓
𝐱𝟎←𝐱𝟏+…+𝐱𝒏
𝒏Centroid:
𝑓 (𝐱𝟏 )≤ 𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒓then
𝐱𝟏
𝐱𝟐
𝐱𝒏+𝟏
𝐱𝟎
𝐱𝒓𝐱𝒏+𝟏
Reflected point
Centroid
𝛼>0
Good value for is .
(e.g., RMSE)
48
Nelder – Mead: Expansion
Expanded point:
𝐱𝟏
𝐱𝟐
𝐱𝒏+𝟏
𝐱𝟎
𝐱𝒓Reflected point
𝐱𝒆Expanded point𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )
𝐱𝒆← 𝐱𝒓+𝛾 ∙ (𝐱𝒓−𝐱𝟎 )
𝑓 (𝐱𝒆 )< 𝑓 (𝐱 𝒓 )if𝐱𝒏+𝟏← 𝐱𝒆then
else 𝐱𝒏+𝟏← 𝐱𝒓
𝛾>0
𝐱𝒏+𝟏
Good value for is .
Overloaded notation: for Nelder-Mead are different from those in Holt-Winters!
𝛼 ,𝛾
49
Nelder – Mead: Contraction
Contracted point:
𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 )
𝐱𝟏
𝐱𝟐
𝐱𝒏+𝟏
𝐱𝟎
𝐱𝒓Reflected point
𝐱𝒄← 𝐱𝟎+𝜌 ∙ (𝐱𝒏+𝟏− 𝐱𝟎 ) 0<𝜌≤0.5
𝑓 (𝐱𝒄 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒄then
𝐱𝒄Contracted point
𝐱𝒏+𝟏
Good value for is .
50
Nelder – Mead: Shrink
Keep the best point and move all the other points towards it:
𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 ) , 𝑓 (𝐱𝒄 )
(𝐱𝟏 )
𝐱 𝒊←𝐱 𝒊+𝜎 ∙ (𝐱𝟏−𝐱 𝒊 )
𝑖∈ {2,3 ,…,𝑛+1 }0<𝜎<1
Good value for is .
𝐱𝟏
𝐱𝟐
𝐱𝒏+𝟏
𝐱𝟐
𝐱𝒏+𝟏
Neither the reflection, nor contraction points are good:
51
6Co
vers
Time
PredictionsTraining
Baseline: Linear regressionFeatures Weights
Prediction:
�̂�=(X ′ X)− 1 X ′ 𝐲𝐲=X 𝛽
Parameter estimation:
𝐱 ′ 𝛽
52
Improvement
Baseline:2 week average
Linear Regression
Linear Regression +Support Vector Machines /
Gradient Boosting Trees
Of the restaurants on the latest platform, now we have:
22% restaurants with less than 5% error;
32% restaurants withless than 10% error.
Number of Points Predicted Well
53
Time
Cove
rs
Past Future Future
Time
Cove
rsPast Future
Time
Cove
rs
Past
Average,Moving Average
Exponential Moving Average,Holt Winters 2D
Holt Winters 3D,Linear Regression,SVM / GBT
Next: Similar Restaurants
54
• Cluster restaurants based on price, capacity, metro and the average number of covers on a specific day.
• Add the number of reservations of similar restaurants as a feature in the previously discussed models.
Summary
55
• OpenTable reservations guarantee a spot at the restaurant.• Data Science:
Search / Recommendations / Advertising. Inventory optimization:
Time series for cover prediction. Optimization.
• Simple, yet effective techniques you can apply.
Acknowledgements
56
Bhanu AgarwalChris GouldCorey ReeseCormac TwomeyDavid AmusinEli ChaitIgor GammerJoseph EssasJosh PolskyKatrin TomanekMats Einarsen
Michael HuangOlivier LarivainPablo DelgadoPavel SyrtsovSergei RadutnuySravani KamisettySteve AnnessaUtkarsh SengarWilliam Wu
58
Top Related