Download - Data science for the hospitality domain - OpenTable

Data Science for the Hospitality Domain

Dr. Nicolas NicolovSr. Director, Head of Data Science,OpenTable, Inc.1 Montgomery Str.San Francisco, CA 94104, U.S.A.

2

OpenTable

• Seated over 1B diners since 1999; $45B spent at partner restaurants.• 20M diners / month.• 42M reviews created since 2008 (650K reviews/month).• 600 partners: Google, TripAdvisor, Bing, Yahoo, Zagat, Eater …

Part of the Priceline group:

3

US

24,194 reservable / 66,109 total

4

UK5,389 reservable6,832 total

5

37, 861 reservable87, 328 totalWorld-wide

6

World-wide1. Italian2. Seafood3. American4. Steak5. Japanese

Restaurant CuisinesTop 5 Cities & Globally

New York City1. Italian2. American3. Japanese4. Seafood5. French

London1. Italian2. Japanese3. Indian4. Steak5. Asian

San Francisco1. Italian2. Seafood3. American4. Steak5. Japanese

Chicago 1. Italian2. American3. Steak4. Seafood5. Steakhouse

Washington DC1. American2. Italian3. Seafood4. Contemporary Am.5. Steak

7

Mobile First

More than 50% of reservations on mobile.

Discovery tab / Collections:• iOS launched: June 2016.• Android launched: Nov 7, 2016.

Collections:

9

Data Science at OpenTable

• Autocomplete.• Search (indexing, ranking).• Recommendations.• Inventory Optimization.• Advertising /

Promoted Inventory.• Content analysis.

• Autocomplete. • Tagging.• Cuisine / menu analysis.• Search (all platforms).• Similarity: User-user / Restaurant-restaurant.• Recommendations (Web, Collections, Emails; Explanations).• Inventory optimization (cover/demand prediction, simulation,

tracking lift).• Sentiment: review analysis.• Review selection. • SEO: Points Of Interest (POIs).• Wait time prediction.• Turn time prediction.

Areas Projects

Search

Autocomplete: Location

11

names/cuisines/tags

SearchRetrievalFacetsRankingTags

12

RankedSearchResults

Facets ~ Search keywords

Dishes

13

Frequent queriesiPhone iPad

14

Time to book

People have to sleep at some point

20 days in advance

15

Hierarchical Cuisines

18

Machine Learning Ranking

Recommendations

Personalized Restaurant Ranking

20

Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%

. . .

Bob 95% 91% 87% 85% 80% 78% 71% 69% 61% 57% 12% 10%

. . .

Topic ModelsFingerprints for restaurants − from our diners’ perspective

21Italia

n fo

od

pizz

a

win

e

wai

ters

expe

nsiv

e

22

Ingredients of a Recommendation Engine

Personalized subgroups (lists/rows)

23

Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%

. . .

Alice

24

Personalized Emails

25

Mobile Recommendations

26

Inventory Optimization

27

Busy restaurants

We can help optimize their

schedule.

28

Seat Most Diners Every Day

The average restaurant has tables empty between turns such that they could accommodate an additional 4,580 diners per year. At $45/guest (avg. cost per meal) that’s $200k.

But squeezing the most out of every seat seems impossible…

29

This reservation prevents an earlier one

If the ‘turn time’ for a party of 2 at the restaurant is 2hrs the ‘Bad Reso’ starting at 7:45pm prevents a reservation @6pm. If we could have only asked the user to shift their reservation by a mere 15mins (to start at 8pm) this would have opened an entire new turn on the table (starting at 6pm).

X

Turn Time

30

Keep the Table Busy the Whole Night

With the later reservation shifted over by only 15 min, now there is space for an earlier turn.

Possible Reservation

31

System Prevents Costly Reservations

If we think diner can book 7:15pm, we will restrict the times that prevent the diner from getting an early turn:

Accepting a 7:45pm reservation will prevent an extra turn on that table.

No Insight into Impact of Accepting a Reso

32

Restaurant staff knows impact of reservation.

Tetris Shows which Resos Cost a Turn

33

34

Simulator No Restrictions Winning Policy

(2 turns) (3 turns)

Techniques for Cover Prediction

Cover Predictionwhat ? why? how ?

36

• Predict future covers of a specific restaurant. • The predicted covers used in calculating lift.• Predictions: Time series and ML models.

Time

Cove

rs

Past Future

PredictionsReal 𝑥0 , 𝑥1 , 𝑥2 ,⋯ ,𝑥𝑛

𝐹 𝑛+1 ,𝐹𝑛+2 ,⋯ ,𝐹 𝑛+𝑘

𝑥0𝑥1

𝑥𝑛

𝐹 𝑛+1𝐹 𝑛+2𝑥2

37

Lift

Lift = Average percentage difference between the observed and predicted covers.

Time

Cove

rs

Past Future

Real w/ new system

Predictions for old systemReal (train)

How the new system did.

How the old system would have done.

Average

38

• Predictions = Average of all existing covers.

1

Past

Time

Cove

rs

Future

PredictionsReal

𝐹 𝑛+ 𝑗=1

𝑛+1∑𝑖=0𝑛

𝑥𝑖𝑗∈1,2 ,…

Moving Average

39

• Predictions: Average of previous k values.• Sliding window: older data points not used.

2

Future

Time

Cove

rs

Past

Predictions

Real

𝑥𝑡′= 1𝑘∑𝑖=0

𝑘−1

𝑥𝑡− 𝑖 𝑡∈𝑘−1 ,𝑘 ,…

𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘Forecast

Smoothing

Exponential Average

40

• Predictions = Combine existing covers by giving exponentially lower weights to older covers.

• Importance given to recent vs. older covers controlled by .

3

Time

Cove

rs

Past Future

PredictionsReal

𝑥𝑖′=𝛼 ∙ 𝑥𝑖+(1−𝛼) ∙𝑥 𝑖− 1′ 𝑖∈1 ,…,𝑛

𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘

𝑥0′ =𝑥0 Initialization

Forecast

Smoothing

(Robert Brown, Charles Holt)

41

Example: Exponential Average𝑥0′ =𝑥0

𝑥1′ =𝛼 ∙𝑥1+(1−𝛼 ) ∙𝑥0𝑥2′ =𝛼 ∙𝑥2+(1−𝛼 ) ∙𝛼 ∙𝑥1+(1−𝛼 )2 ∙𝑥0

𝑥3′ =𝛼 ∙ 𝑥3+ (1−𝛼 ) ∙𝛼 ∙ 𝑥2+ (1−𝛼 )2 ∙𝛼 ∙ 𝑥1+(1−𝛼)3 ∙𝑥0

𝑥𝑛′ =𝛼 ∙ 𝑥𝑛+(1−𝛼 )1 ∙𝛼 ∙ 𝑥𝑛−1+(1−𝛼 )2 ∙𝛼 ∙𝑥𝑛−2+⋯+ (1−𝛼 )𝑛− 2 ∙𝛼 ∙𝑥1+(1−𝛼)𝑛−1 ∙ 𝑥0

⋮

Holt Winters 2D

42

• Take into account previous value and the trend.• Trend is the slope between current and previous points. • and control weight given to current point and trend.

4

Time

Cove

rs

Past Future

Predictions

Real Level

Trend

Level

𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend

𝑥𝑖′=𝑙𝑖+𝑏𝑖

Forecast

double exponential smoothing

𝑙𝑖=𝛼 ∙ 𝑥𝑖+(1−𝛼 ) ∙ (𝑙𝑖− 1+𝑏𝑖− 1)

𝑙1=𝑥1 ; 𝑏1=𝑥1−𝑥0 Initialization

𝐹 𝑛+ 𝑗=𝑥𝑛′ + 𝑗 ∙𝑏𝑛

Smoothing

𝑖∈2 ,…,𝑛

𝑖∈2 ,…,𝑛

43

Example: Holt Winters 2D

𝑙2=𝛼 ∙ 𝑥2+(1−𝛼 ) ∙ (𝑙1+𝑏1 )=𝛼 ∙𝑥2+(1−𝛼 ) ∙ (𝑥1+ (𝑥1−𝑥0 ))=…

𝑙1=𝑥1𝑏1=𝑥1−𝑥0

𝑥2′ =𝑙2+𝑏2

𝑏2=𝛽 ∙ ( 𝑙2− 𝑙1 )+ (1− 𝛽) ∙𝑏1=…

(initialization)

Holt Winters 3D

44

• Predictions = Take into account previous value, trend in covers and seasonality.

• Trend is the slope between current and the previous point. • Seasonality takes into account the average of every kth point in the

season, in our case season is 7 points or 1 week.• , and control weight given to current point, trend and seasonality.

5

Time

Cove

rs

Past Future

PredictionsReal Level

TrendSeasonal

𝑙𝑖=𝛼 ∙ (𝑥 𝑖−𝑠𝑖−𝐿 )+(1−𝛼 ) ∙ (𝑙 𝑖−1+𝑏𝑖− 1 ) Level

𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend

𝑠𝑖=𝛾 ∙ (𝑥𝑖− 𝑙𝑖 )+ (1−𝛾 ) ∙𝑠𝑖−𝐿 Seasonality

triple exponential smoothing (with additive seasonality)

𝐹 𝑛+ 𝑗=𝑙𝑛 + 𝑗 .𝑏𝑛+𝑠𝑛−𝐿+1+ ( 𝑗 −1)𝑚𝑜𝑑 𝐿 Forecast(Peter Winters)

Holt Winters 3D with Seasonality

45

L = season length = 1 week

5

Time

Cove

rs

Future

1 week

Past

Algorithm

ObservedSeasonal

𝑏0=1𝐿 ( 𝑥𝐿+1−𝑥1

𝐿 +𝑥𝐿+2−𝑥2

𝐿 +…+𝑥 𝐿+𝐿−𝑥 𝐿

𝐿 )

𝑙0=𝑥0

Calculating Hyper Parameters

46

5• Minimizing objective function: Root

Mean Square Error (RMSE); depends on .

• Nelder-Mead heuristic search method.

• Simplex is a polytope of n + 1 vertices in n dimensions.

• At each step we do: reflection, expansion, contraction or shrinkage.(John A. Nelder & Roger Mead)

47

Nelder – Mead: Reflection

Reflection:

𝑓 (𝐱 )Objective function:

Initial test points:

𝐱∈ℝ𝒏

𝐱𝟏 ,…,𝐱𝒏+𝟏∈ℝ𝒏

Sort: 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓← 𝐱𝟎+𝜶 ∙ (𝐱𝟎−𝐱𝒏+𝟏 )

𝐱𝒓

𝐱𝟎←𝐱𝟏+…+𝐱𝒏

𝒏Centroid:

𝑓 (𝐱𝟏 )≤ 𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒓then

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓𝐱𝒏+𝟏

Reflected point

Centroid

𝛼>0

Good value for is .

(e.g., RMSE)

48

Nelder – Mead: Expansion

Expanded point:

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓Reflected point

𝐱𝒆Expanded point𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )

𝐱𝒆← 𝐱𝒓+𝛾 ∙ (𝐱𝒓−𝐱𝟎 )

𝑓 (𝐱𝒆 )< 𝑓 (𝐱 𝒓 )if𝐱𝒏+𝟏← 𝐱𝒆then

else 𝐱𝒏+𝟏← 𝐱𝒓

𝛾>0

𝐱𝒏+𝟏

Good value for is .

Overloaded notation: for Nelder-Mead are different from those in Holt-Winters!

𝛼 ,𝛾

49

Nelder – Mead: Contraction

Contracted point:

𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 )

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓Reflected point

𝐱𝒄← 𝐱𝟎+𝜌 ∙ (𝐱𝒏+𝟏− 𝐱𝟎 ) 0<𝜌≤0.5

𝑓 (𝐱𝒄 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒄then

𝐱𝒄Contracted point

𝐱𝒏+𝟏

Good value for is .

50

Nelder – Mead: Shrink

Keep the best point and move all the other points towards it:

𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 ) , 𝑓 (𝐱𝒄 )

(𝐱𝟏 )

𝐱 𝒊←𝐱 𝒊+𝜎 ∙ (𝐱𝟏−𝐱 𝒊 )

𝑖∈ {2,3 ,…,𝑛+1 }0<𝜎<1

Good value for is .

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟐

𝐱𝒏+𝟏

Neither the reflection, nor contraction points are good:

51

6Co

vers

Time

PredictionsTraining

Baseline: Linear regressionFeatures Weights

Prediction:

�̂�=(X ′ X)− 1 X ′ 𝐲𝐲=X 𝛽

Parameter estimation:

𝐱 ′ 𝛽

52

Improvement

Baseline:2 week average

Linear Regression

Linear Regression +Support Vector Machines /

Gradient Boosting Trees

Of the restaurants on the latest platform, now we have:

22% restaurants with less than 5% error;

32% restaurants withless than 10% error.

Number of Points Predicted Well

53

Time

Cove

rs

Past Future Future

Time

Cove

rsPast Future

Time

Cove

rs

Past

Average,Moving Average

Exponential Moving Average,Holt Winters 2D

Holt Winters 3D,Linear Regression,SVM / GBT

Next: Similar Restaurants

54

• Cluster restaurants based on price, capacity, metro and the average number of covers on a specific day.

• Add the number of reservations of similar restaurants as a feature in the previously discussed models.

Summary

55

• OpenTable reservations guarantee a spot at the restaurant.• Data Science:

Search / Recommendations / Advertising. Inventory optimization:

Time series for cover prediction. Optimization.

• Simple, yet effective techniques you can apply.

Acknowledgements

56

Bhanu AgarwalChris GouldCorey ReeseCormac TwomeyDavid AmusinEli ChaitIgor GammerJoseph EssasJosh PolskyKatrin TomanekMats Einarsen

Michael HuangOlivier LarivainPablo DelgadoPavel SyrtsovSergei RadutnuySravani KamisettySteve AnnessaUtkarsh SengarWilliam Wu