Андрей Гулин "Знакомство с MatrixNet"
-
Upload
yandex -
Category
Technology
-
view
2.669 -
download
8
description
Transcript of Андрей Гулин "Знакомство с MatrixNet"
![Page 1: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/1.jpg)
Moscow, 03.07.2012
Andrey Gulin
MatrixNet
![Page 2: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/2.jpg)
Why is this relevant? — CERN event classification problem
— CERN solution quality 94.4%
— Yandex MatrixNet solution quality 95.8%
![Page 3: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/3.jpg)
Machine Learning — Deterministic processes -> programming, computer
science etc
— Noisy data -> statistics, machine learning etc
— Supervised / Unsupervised / Semisupervised
— Offline/online learning
![Page 4: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/4.jpg)
ML applications — Yandex: ranking, spam classification, user behavior
modeling etc
— CERN: event classification etc
— Finance: fraud detection, credit scoring etc
— …
![Page 5: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/5.jpg)
Binary classification problem — Offline Supervised problem
— Given samples of 2 classes predict the class of unseen sample
— For each sample we know N real valued features {xi}
![Page 6: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/6.jpg)
Solution quality measures — ROC (receiver operating characteristic) curve
— AUC (area under curve)
![Page 7: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/7.jpg)
Solution quality measures — Precision Recall curve
— BEP (Break Even Point), precision == recall
![Page 8: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/8.jpg)
Solution quality measures — Log likelihood / cross entropy = sum {log P}
— Convex function with derivatives
— Used as a proxy for non-continuous functions like AUC/BEP etc
![Page 9: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/9.jpg)
Methods — Nearest Neighbors
— SVM
— Logistic regression (linear regression with logistic transform of the result)
— “Neural” networks = non linear regression
— Decision Trees
— Boosted Decision Trees
![Page 10: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/10.jpg)
Decision Tree F1 > 3
F2 > 3
F1 > 6
![Page 11: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/11.jpg)
Bootstrapping — Take N random samples with replacement from
original set
— Easy way to estimate all sorts of statistics over the set
— Including building model of the set
![Page 12: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/12.jpg)
Boosting — Building strong model as a combination of “weak
models”
— Iterative process, on each iteration we
— Approximate current residual with the best “weak model”
— Scale new “weak model” by small number
— Add it to the solution
— Approximating loss function gradient instead of residual gives gradient boosting
![Page 13: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/13.jpg)
Overfitting
![Page 14: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/14.jpg)
Boosting — Greedy selection and scaling produce regularization
effect
— If “weak model” is a scaled feature, then Boosting produces L1 regularized solution
— If “weak model” is a greedily constructed decision tree, then Boosting gives a form hierarchical sparsity constraint
![Page 15: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/15.jpg)
MatrixNet
![Page 16: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/16.jpg)
MatrixNet — MatrixNet is an implementation of gradient boosted
decision trees algorithm
— MatrixNet is a bit different from standard
—Using Oblivious Trees
—Accounting for sample count in each leaf
![Page 17: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/17.jpg)
Oblivious Trees F1 > 3
F2 > 3
F2 > 3
![Page 18: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/18.jpg)
Accounting leaf sample count — Prefer trees with large average in leafs with many
samples
— F.e. multiplying leaf average by sqrt(N/(N+100)) (N – leaf sample count) produces better model
![Page 19: Андрей Гулин "Знакомство с MatrixNet"](https://reader034.fdocuments.us/reader034/viewer/2022042504/557f339ed8b42a46658b4f81/html5/thumbnails/19.jpg)
Questions?