L9. Real World Machine Learning - Cooking Predictions
-
Upload
machine-learning-valencia -
Category
Data & Analytics
-
view
1.032 -
download
3
Transcript of L9. Real World Machine Learning - Cooking Predictions
![Page 1: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/1.jpg)
Cooking PredictionsA real case in the hotel sector
Andrés González Big Data Prediction Manager
[email protected] Twitter: @data_lytics
![Page 2: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/2.jpg)
CleverTask Solutions SL - Big Data Business Unit 3
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 3: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/3.jpg)
CleverTask Solutions SL - Big Data Business Unit 4
Hotel Sector
• % room occupation. • Cancellation risk. • Income.
![Page 4: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/4.jpg)
CleverTask Solutions SL - Big Data Business Unit 5
Business Need
Predict client’s
NATIONALITY
BEFORE
client
check-in
![Page 5: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/5.jpg)
CleverTask Solutions SL - Big Data Business Unit 6
Staff Arrangement
Languages
![Page 6: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/6.jpg)
CleverTask Solutions SL - Big Data Business Unit 7
Prepare Activities
![Page 7: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/7.jpg)
CleverTask Solutions SL - Big Data Business Unit 8
Kitchen Arrangement
![Page 8: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/8.jpg)
CleverTask Solutions SL - Big Data Business Unit 9
Customize Stay
![Page 9: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/9.jpg)
CleverTask Solutions SL - Big Data Business Unit 10
… Details Make the Difference
In short, because…
![Page 10: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/10.jpg)
CleverTask Solutions SL - Big Data Business Unit 11
Machine Learning basics
![Page 11: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/11.jpg)
CleverTask Solutions SL - Big Data Business Unit 12
Machine Learning basics
Can you find patterns in this data?
![Page 12: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/12.jpg)
CleverTask Solutions SL - Big Data Business Unit
13
Machine Learning basics
Historical Data Training Prediction
New Data Re-Training
![Page 13: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/13.jpg)
CleverTask Solutions SL - Big Data Business Unit 14
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 14: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/14.jpg)
CleverTask Solutions SL - Big Data Business Unit
Tasting the Dish
Cooking
Transforming
15
“Cooking” Predictions2
Go to the market to buy ingredients
Cleaning
![Page 15: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/15.jpg)
CleverTask Solutions SL - Big Data Business Unit
Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
15
“Cooking” Predictions2
Gathering RAW data
Cleaning Data
![Page 16: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/16.jpg)
CleverTask Solutions SL - Big Data Business Unit 16
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 17: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/17.jpg)
CleverTask Solutions SL - Big Data Business Unit 17
Where does Data come from?
Own Website
Partners Websites
RAW Data
![Page 18: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/18.jpg)
CleverTask Solutions SL - Big Data Business Unit 18
RAW Data
One year historical reservation data
(.xlsx file)
Characteristics •260.000 reservations •80 fields
•57 categorical •9 numeric •10 date •3 text •1 incorrect field
•Size: 150 MB
![Page 19: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/19.jpg)
CleverTask Solutions SL - Big Data Business Unit 19
RAW Data
![Page 20: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/20.jpg)
CleverTask Solutions SL - Big Data Business Unit 20
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 21: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/21.jpg)
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
21
The Process
New Fields
1 3 4
Transformation and Feature Engineering
“Clean” Data
Calculated Fields
2Cleaning Model
![Page 22: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/22.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 23: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/23.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 24: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/24.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 25: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/25.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 26: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/26.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 27: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/27.jpg)
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
![Page 28: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/28.jpg)
CleverTask Solutions SL - Big Data Business Unit 23
Data Cleaning
Row Deletion
• Reservations without check-in
• Cancelled reservations • Rows with errors
Column Deletion
• IDs vs names • Columns with little data
Other Actions
• Give dates a format • Delete accents • Transform .xlsx -> .csv
![Page 29: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/29.jpg)
CleverTask Solutions SL - Big Data Business Unit 24
Clean Dataset
Clean
•150.000 reservations •46 fields •26 categorical •9 numeric •10 data •1 text
•Size: 75MB
Dirty
•260.000 reservations •80 fields
•57 categorical •9 numeric •10 data •3 text •1 incorrect field
•Size: 150 MB
![Page 30: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/30.jpg)
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
25
The Process
New Fields
1 3 4
Transformations and Feature Engineering
“Clean” Data
Calculated Fields
2Cleaning Model
![Page 31: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/31.jpg)
CleverTask Solutions SL - Big Data Business Unit 26
TransformationsCountry Grouping
•A lot of countries to predict (210)
•Some countries have very few instances
•Grouping objective: mín. 1% of total instances
• Does not affect business objective
•Total number of groups: 20
New Fields
• RESERV_ANTICIPATION (calculated): (reservation date - checkin date)
• COUNTRY_HOTEL (name of the country)
• HOTEL_STARS (1-5)
![Page 32: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/32.jpg)
CleverTask Solutions SL - Big Data Business Unit 27
Clean Dataset
Clean •150.000 reservations •46 fields •Size: 75MB
Dirty •260.000 reservations •80 fields •Size: 150 MB
Transformed •150.000 registers •49 fields •Size: 80MB
![Page 33: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/33.jpg)
CleverTask Solutions SL - Big Data Business Unit 28
What is Feature Engineering
Extract signal from noise
![Page 34: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/34.jpg)
CleverTask Solutions SL - Big Data Business Unit 29
Feature Engineering Techniques
• Detecta fields (features) that are predictorss
(signal) and bypass those that are not (noise)
• Dependand fields (pax, days, pax*days) • Needless fields (reservation number) • Fields with very little data • Random fields (minute and second of reservation)
• Domain knowledge • Experience • Recursive cycle
![Page 35: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/35.jpg)
CleverTask Solutions SL - Big Data Business Unit 30
Field Selection
Algorithm Adjustment
Prediction
Quality Evaluation
Recursive Feature Engineering
![Page 36: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/36.jpg)
CleverTask Solutions SL - Big Data Business Unit 31
Clean Dataset
Clean •150.000 reservations •46 fields •Size: 75MB
Dirty •260.000 reservations •80 fields •Size: 150 MB
Transformed •150.000 registers •49 fields •Size: 80MB
Final Dataset •150.000 registers •10 fields •Size: 55MB
![Page 37: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/37.jpg)
CleverTask Solutions SL - Big Data Business Unit 32
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 38: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/38.jpg)
CleverTask Solutions SL - Big Data Business Unit 33
The Process
“Dirty” RAW Data
New Fields
1 3 4Gathering Data
Transformation and Feature Engineering
“Clean” Data
Calculated
2Cleaning Modeling
![Page 39: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/39.jpg)
CleverTask Solutions SL - Big Data Business Unit 34
ModelingTraining Learning
![Page 40: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/40.jpg)
CleverTask Solutions SL - Big Data Business Unit 35
Modeling
![Page 41: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/41.jpg)
CleverTask Solutions SL - Big Data Business Unit 37
Agenda Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
![Page 42: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/42.jpg)
CleverTask Solutions SL - Big Data Business Unit 38
Quality Evaluation
80%
20% Evaluation
Training
TestDataset 100%
Modelo
![Page 43: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/43.jpg)
CleverTask Solutions SL - Big Data Business Unit 39
Quality Evaluation
Accuracy Confusion Matrix
![Page 44: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/44.jpg)
CleverTask Solutions SL - Big Data Business Unit 40
Quality Evaluation
54% 75%
![Page 45: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/45.jpg)
CleverTask Solutions SL - Big Data Business Unit 41
Quality EvaluationPredicted vs Real Distribution
![Page 46: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/46.jpg)
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%Tasting the Dish
Cooking
Transforming
Go to the market to buy ingredients
Cleaning
![Page 47: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/47.jpg)
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
Gathering RAW data
Cleaning Data
![Page 48: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/48.jpg)
CleverTask Solutions SL - Big Data Business Unit 43
Other TechniquesEnsembles Clusters
Weight Analysis Anomaly Detection
![Page 49: L9. Real World Machine Learning - Cooking Predictions](https://reader037.fdocuments.us/reader037/viewer/2022110107/58a7c5811a28ab6b5a8b56b1/html5/thumbnails/49.jpg)
CleverTask Solutions SL - Big Data Business Unit 44
ENDemail: [email protected]
Twitter: @data_lytics
www.clevertask.com