XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... ·...
Transcript of XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... ·...
![Page 1: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/1.jpg)
XGBOOST: A SCALABLE TREE BOOSTING SYSTEMADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016
![Page 2: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/2.jpg)
OutlineIntroduction
Method
Experiment
Conclusion
2
![Page 3: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/3.jpg)
IntroductionRegression tree
CART (Gini)
Boosting
Ensemble method, an iterative procedure adaptively change the distribution of training examples.
Adaboost
3
![Page 4: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/4.jpg)
IntroductionThe most important factor of XGBoost —
Scalability.
Billions of examples.
4
![Page 5: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/5.jpg)
IntroductionA practical choice:
17 out of 29 winning solutions in Kaggle 2015.
Top-10 teams all used XGBoost in KDDcup 2015
T-brain: used in top-3 teams.
Ad click through rate prediction, malware classification, customer behavior prediction, etc.
5
![Page 6: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/6.jpg)
MethodTree ensemble model:
Prediction Leaf weights of a tree
6
![Page 7: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/7.jpg)
MethodRegularized objective function:
Differentiable convex loss function
Number of leaves +
Weights on leave
Model complexity
Number of leaves
7
Objective function
![Page 8: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/8.jpg)
MethodGradient tree boosting:
Model is trained in additive manner.
Usual
__
_________
8
Objective function
![Page 9: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/9.jpg)
MethodAdditive training (Boosting)
9
Objective function
![Page 10: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/10.jpg)
Method
Taylor expansion:
10
Objective function
![Page 11: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/11.jpg)
:instance set of j ( xi in leaf j )
Method
T : number of leaf
11
Objective function
![Page 12: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/12.jpg)
Method
For a fixed tree q, the optimal weight is:
12
Objective function
![Page 13: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/13.jpg)
MethodFor a fixed tree q, the optimal weight is:
The corresponding optimal value is:
13
Objective function
![Page 14: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/14.jpg)
MethodFrom now, if the tree is known, we get the optimal value.
The problem becomes “what tree is the best ?”
Left subtree. Right subtree. Parent
Loss reduction
The larger the better, might be negative
Greedy strategy
14
Objective function
![Page 15: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/15.jpg)
MethodPreventing overfitting further:
Shrinkage.
Subsampling. (column)
15
Objective function
![Page 16: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/16.jpg)
MethodBasic Exact Greedy Algorithm.
Approximate Algorithm.
Global
Local
16
Split Finding
![Page 17: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/17.jpg)
MethodBasic Exact Greedy Algorithm:
17
Split Finding
.m
When to stop?
![Page 18: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/18.jpg)
MethodB.E.G.A. is good, since all possible splits, but…. When data can’t fit in memory, the thrashing slow down the system.
Approximations:
18
Split Finding
![Page 19: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/19.jpg)
MethodLocal/ Global agendas:
Global: less proposal but more candidate point.
19
Split Finding
![Page 20: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/20.jpg)
MethodWeighted quantile sketch:
Each interval has the same “impact” on OF.
20
Split Finding
![Page 21: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/21.jpg)
MethodSparsity-aware:
Possible reasons:
Missing value
Frequent zero
Artifacts of feature engineering (like one-hot)
Solution: default direction
21
Split Finding
![Page 22: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/22.jpg)
Method
22
Split Finding
Sort criteria: Missing value last
Learn the best direction (of the feature)
![Page 23: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/23.jpg)
MethodNon-presence -> missing value.
Only deal with presence.
50x faster than naive ver. , on Allstate.
23
Split Finding
![Page 24: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/24.jpg)
MethodThe most time consuming part: sorting.
Sort just once.
Store data in in-memory unit: block.
24
System Design
![Page 25: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/25.jpg)
MethodCSC format (compressed column)
Ex:
Different blocks can be distributed across machine, stored on disk in the out-of-core setting.
25
System Design
![Page 26: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/26.jpg)
MethodBlock structure helps split finding.
However, it’s a non-continuous memory access.
Solution: allocate an internal buffer in each thread.
26
System Design
![Page 27: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/27.jpg)
MethodBlock size matters. (max number of examples)
Small blocks result in small workload for each thread.
Large blocks lead cache missing.
27
System Design
Balance!
![Page 28: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/28.jpg)
MethodOut-of-core computation:
Block compression
Ex: [0, 2, 2, 0, 1, 2]
Block sharding
A prefetch thread is assigned to each disk.
28
System Design
![Page 30: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/30.jpg)
ExperimentClassification:
GBM expands one branch of a tree.
Other two expand full tree.
30
![Page 31: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/31.jpg)
ExperimentLearning to rank:
pGBRT: the best previously published system.
pGBRT only supports approximate algorithm.
31
![Page 32: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/32.jpg)
ExperimentOut-of-core experiment
Compression helps 3x times.
Sharding into two give 2x speedup.
32
![Page 33: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9de93dcd32e2e1104f7e2/html5/thumbnails/33.jpg)
Conclusion
The most Important feature: Scalability !
Lessons from building XGBoost:
Sparsity aware, weighted quantile sketch, cache aware, parallelization.
33
System Design
Fin.