Random Forests: One Tool for All Your...
Transcript of Random Forests: One Tool for All Your...
![Page 1: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/1.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Random Forests: One Tool for All Your Problems
Neil Houlsby and Novi Quadrianto
RCC4th July 2013
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 2: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/2.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
• This talk: random forest
• A forest is an ensemble of trees. The trees are all slightlydifferent from one another.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 3: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/3.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• Random forests do what humans do in making importantdecisions
For example: a scientific paper reviewing process
Properties:
I The more people, thebetter the decision is
I The more diverse the set ofpeople, the better
I It is an embarrassinglyparallel process
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 4: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/4.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• Random forests do what humans do in making importantdecisions
For example: a scientific paper reviewing process
Properties:
I The more people, thebetter the decision is
I The more diverse the set ofpeople, the better
I It is an embarrassinglyparallel process
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 5: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/5.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• Random forests do what humans do in making importantdecisions
For example: a scientific paper reviewing process
Properties:
I The more people, thebetter the decision is
I The more diverse the set ofpeople, the better
I It is an embarrassinglyparallel process
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 6: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/6.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• Random forests do what humans do in making importantdecisions
For example: a scientific paper reviewing process
Properties:
I The more people, thebetter the decision is
I The more diverse the set ofpeople, the better
I It is an embarrassinglyparallel process
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 7: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/7.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• ”Everbody” in Cambridge talks about it
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 8: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/8.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• One tool for all your learning problemsI classificationI regressionI density estimationI manifold learningI semi-supervised learningI . . .
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 9: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/9.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Motivation
Why should you care about random forest?
• Did I mention random forest achieve state-of-the-artperformance and a workhorse in some of industrialapplications?
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 10: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/10.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Industrial Application: Semantic Segmentation in Kinect
Success stories of random forest from Kinect for MicrosoftXbox 360
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 11: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/11.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Industrial Application: Semantic Segmentation in Kinect
Task: Classification
input: depth frame output: body part
• input space X = {images}• output space Y ={l.hand, r.hand, head, l.shoulder, r.shoulder, . . . }: 31body parts
J. Shotton et al., Real-Time Human Pose Recognition in Parts from Single Depth Images, CVPR 2011.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 12: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/12.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
References
• A. Criminisi, J. Shotton, E. Konukoglu, Decision forests:
A unified framework for classification,
regression, density estimation, manifold learning
and semi-supervised learning, Foundations and
Trends in Computer Graphics and Vision, 2012(available online as an MSR technical report)
• G. Biau, L. Devroye, and G. Lugosi, Consistency of
Random Forests and Other Averaging Classifiers,JMLR, 2008 (statistical properties of random forest)
• G. Biau, Analysis of a random forests model, JMLR,2012 (statistical properties of random forest)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 13: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/13.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
A Decision Tree
• datapoint v ∈ Rd (d can be large)
• data injected at root of tree, encounters three node types
• testing: descend tree O(logK) binary choices
• training: optimise the node parameters using data
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 14: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/14.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Training a Tree I
• training set S0• node j splits the data Sj = SLj ∪ SRj• each node (weak learner) has parameters: θ = {φ,ψ, τ}
I φ(v) feature selection functionI ψ decision parametersI τ decision thresholds
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 15: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/15.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Training a Tree II
• decision function: h(v,θj) ∈ {0, 1}• during training optimize energy function:
θ∗j = arg maxθj
Ij(Sj ,SLj ,SRj ,θj)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 16: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/16.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Training a Tree III
Energy function usually some measure of ‘information gain’.
θ∗j = arg maxθj
Ij(Sj ,SLj ,SRj ,θj),
Ij = H(S)−∑
i∈{L,R}
|Sij ||Sj |
H(Sj)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 17: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/17.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Weak Learner
• stump/linear boundary
h(v,θj) = [τ1 > φ(v) ·ψ > τ2]
• conic section (2D)
h(v,θj) =[τ1 > φ
T(v)ψφ(v) > τ2]
• usually τ1 = +∞ or τ2 = −∞Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 18: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/18.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Trees to Forests
• forest: ensemble of trees, introduce randomness in training
• two approaches: subsample training data (bagging), Randomised NodeOptimisation
• RNO:
I denote T is all possible parameter settings θ = {φ,ψ, τ}I at each node sample a finite set from possible parameters Tj ⊂ TI define ‘randomness’ ρ = |Tj |I ρ = 1 max decorrelation & data-independent ρ = |T | identical trees
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 19: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/19.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Prediction and Ensembling
• p(c|v), e.g. class, continuous output
• probabilistic (e.g. histogram), point estimate
• combine predictions:
p(c|v) =1
T
T∑t=1
pt(c|v), or p(c|v) =1
Z
T∏t=1
pt(c|v)
• product less robust to noise/overconfident trees – trees notindependent
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 20: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/20.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Summary of Parameters
• tree parameters:I stopping criteria/tree depth DI tree randomness ρI node test parameters: weak learner model h(v,θ), feature
selector function φ(v)I training objective function Ij(Sj ,SLj ,SRj ,θj)I leaf prediction model pt(c|v)
• forest parametersI forest size TI ensemble model
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 21: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/21.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Classification
• choose: H(S) = Shannon entropy, pt(c|v) = empiricaldistribution leaf, linear ensemble averaging
• properties:I naturally handles multi-classI probabilistic outputI in practice have good generalisation and efficiencyI max margin - like behaviourI parameter tuning: sensitivity, efficiency/accuracy trade off
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 22: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/22.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
The Effect of Forest Size
• axis aligned weak learner, depth D = 2
• all trees classify training data perfectly
• larger forests, better generalisation, more uncertainty furtherfrom training data (parallel training)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 23: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/23.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
The Effect of Tree Depth
• multiclass, T = 200, conicweak learner
• high uncertainty away fromdata, max-margin like effects
• depth governs predictiveconfidence andunder/over-fitting
• larger D requires morecomputational time
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 24: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/24.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Weak Learner and Randomness
• weak learner: computational/accuracy trade-off
• increased D (or T ) can compensate for ‘overly-simple’ weaklearner
• increased randomness (smaller ρ) decreases tree correlation,removes artifacts, lower confidence
each column different weak learner (L to R: stump, linear, conic),top D = 5, bottom D = 13, left: ρ = 500, right: ρ = 5
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 25: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/25.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Max-margin Properties I
• simple scenario: two classes,d = 2, D = 2, T is large,stump weak learner, RNO,ρ→ |T |
• assumption: equally optimalparameters are chosenuniformly
I forest posterior prediction(linear ensembling ofpt(c|x)) changes linearlybetween classes
• assume equal loss → maxmargin decision boundary
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 26: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/26.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Max-margin Properties II
• assume separability in x1 plane:
h(v|θj) = [φ(v) > τ ], φ(v) = x1
• in the limit assume uniform distributions of equally optimalplanes:
limρ→|T |,T→∞
p(c = c1|x1) =x1 − x′1
∆, ∀x1 ∈ [x′1, x
′′1]
where x′1, x′2 are the ‘support vectors’, and ∆ is the ‘gap’
• optimising the separating lineτ∗ = arg min
τ|p(c = c1|x1 = τ)− p(c = c2|x1 = τ)| yields:
limρ→|T |,T→∞
τ∗ = x′1 + ∆/2
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 27: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/27.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Max-margin Properties III
• adding randomness has a similar effect to ‘slack variables’ inSVMs
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 28: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/28.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Max-margin Properties IV
• bagging does not yield maximum margin separation
• but training is faster
combining bagging (50%) with random selection of optimalparameters
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 29: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/29.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Empirical Evaluations
• evaluation on many datasets using accuracy, RMSE, and AUC 1
• figure: average (over metrics) cumulative score over datasets of
increasing dimensionality
I scores normalised (subtract median), positive gradient indicates
better than average performance as dim increases
1Caruana, R. et al. An Empirical Evaluation of Supervised Learning in High Dimensions (ICML 2008)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 30: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/30.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Regression Forests
• rename output c ∈ {1, . . . , C} to y ∈ R, to model simply replace:I leaf prediction model, p(y|v) (point-wise or probabilistic)I assume underlying linear regressionI training objective function (Gaussian prediction):
Ij(Sj ,SLj ,SRj ,θj) =∑v∈Sj
log(|Λy(v|)−∑
i∈{L,R}
∑v∈Si
j
log(|Λy(v|)
where Λy(v) is the predictive covariance matrix at v
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 31: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/31.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Comparison to Gaussian Processes
note: comparison only to a ‘vanilla’ GP with SE kernel
property GPs RFs
interpolation/ more uncertainty further from dataextrapolation kernel dependent weak learner dependent
predictions smoothly varying posterior meanuni-modal multi-modal
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 32: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/32.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Comparison to Gaussian Processes - Ambiguous Output
• neither model is appropriate, RF yields larger uncertaintyI use maximum marginal likelihood hyper-parameter
optimization
• better modelled with density estimation
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 33: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/33.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Application: Semantic Parsing of 3D CT Scans
left: 2D slice, center: 3D reconstruction, right: automatically localised
kidney
• task: localize anatomical structures in a 3D scan• uses
I transmitting relevant parts of scan in low bandwidth networksI tracking radiation doseI efficient labelling/navigation/browsing of scansI image registration
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 34: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/34.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semantic Parsing of 3D CT Scans – RF Regression
• regression from voxel to relative location of bounding box
p→ b(p),
where p = (x, y, z) ∈ R3,
b(p) = (dL(p), dR(p), dA(p), dP (p), dH(p), dF (p)) ∈ R6
• each voxel ‘votes’ on the location of the box
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 35: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/35.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semantic Parsing of 3D CT Scans – Features
• single feature dimension: average intensityin a 3D box at location q (relative to p):
xi =1
Bi
∑qi∈Bi
J(q)
• feature vector v(p) = (x1, . . . , xd) ∈ Rd
• d can be unbounded – when trainingsample dimensions at random, computeon demand
features box (not tobe confused with
target)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 36: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/36.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semantic Parsing of 3D CT Scans – Model
• multivariate Gaussian probabilistic-constant model
• axis-aligned weak learner:
h(v,θj) = [φ(v,Bj) > τj ],
where φ(v,Bj) = xj
• i.e. threshold on average intensity in a single feature box Bj
• optimise node parameters θj = (Bj , τj) ∈ R7 with standarddifferential entropy criteria
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 37: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/37.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semantic Parsing of 3D CT Scans – Results
localisation of right kidney, red = prediction, blue = ground truth
• error ≈ 5mm
• robust to large variability in shape, position etc. of kidney
• note missing left lung in one case (RHS)
• localisation of 25 structures, single core ≈ 4s
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 38: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/38.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semantic Parsing of 3D CT Scans – Feature Discovery
• each node represents a cluster of points• predictive confidence identifies salient features
green regions = feature boxes of parent nodes’ tests for leaf nodeswith high predictive confidence
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 39: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/39.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Estimation
The Problem:
• Given a set of unlabelled observations, estimate theprobability density function that generates the data
The Solution: Density forests
• Density forests are ensembles of clustering trees
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 40: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/40.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests
• choose:I H(S) = 1
2 log((2πe)d|Λ(S)|) (i.e. a differential entropy of ad-variate Gaussian with d× d covariance matrix Λ)
I pt(v) =Gaussian distribution over a bounded domainI linear ensemble averaging
• properties: a density forest is a generalisation of GMMs butI multiple hard clusterings are created (one per tree), instead of
a single soft clusteringI each input data is explained by multiple clusters, instead of a
single linear combination of Gaussians
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 41: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/41.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests
• choose:I H(S) = 1
2 log((2πe)d|Λ(S)|) (i.e. a differential entropy of ad-variate Gaussian with d× d covariance matrix Λ)
I pt(v) =Gaussian distribution over a bounded domainI linear ensemble averaging
• properties: a density forest is a generalisation of GMMs butI multiple hard clusterings are created (one per tree), instead of
a single soft clusteringI each input data is explained by multiple clusters, instead of a
single linear combination of Gaussians
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 42: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/42.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests
The prediction model
pt(v) =πl(v)
ZtN (v|µl(v),Λl(v)); πl(v) =
|Sl||S0|
The partition function Zt
Zt =
∫ (∑l
πlN (v|µl,Λl)p(l|v)
)dv; p(l|v) = I[v ∈ l(v)]
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 43: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/43.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests
Approximating the partition function
Zt =
∫ (πl(v)N (v|µl(v),Λl(v))
)dv
• compute via cumulative multivariate normal for axis-alignedweak learners
• compute via grid-based numerical integration
Zt ≈ ∆∑i
πl(vi)N (vi|µl(vi),Λl(vi))
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 44: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/44.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – effect of model parameters
Effect of tree depth D
T = 200, weak learner = axis-aligned, predictor = multivariate Gaussian
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 45: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/45.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – effect of model parameters
Effect of forest size T
Interesting: even if individual trees heavily over-fit (at D = 6),increasing forest size T produces smoother densities.Always: set T to sufficiently large value.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 46: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/46.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – comparisons with GMM
Interesting: the use of randomness (density forests or GMMs)improves results
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 47: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/47.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – sampling from densities
1. Sample uniformly 1, . . . , T do select a tree in the forest
2. Start at the root node, then randomly generate the childrenindex with probability proportional to number of trainingpoints in edge (edge thickness)
3. Repeat step 2 until a leaf is reached
4. At the leaf, sample a point from the domain bounded Gaussian
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 48: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/48.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – sampling from densities
Top row: learned densities from 100 training dataBottom row: 10, 000 random points from the density forests
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 49: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/49.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – regressing non-functional relations
The Problem:
• For a given value of input x, there may be multiple values ofoutput y
The Solution:
• Estimate p(x, y) via density forests, subsequently compute therequired conditional
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 50: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/50.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – regressing non-functional relations
Restriction:axis-aligned as the weak learner modelNeil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 51: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/51.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – regressing non-functional relations
Tree density: pt(x, y) = πlZtN (x, y|µl,Λl)
with µl = (µx, µy); Λl =
[σ2xx σ2xyσ2xy σ2yy
]Tree conditional: pt(y|x = x∗) =
1
Zt,x∗
∑l∈Lt,x∗
[yBl ≤ y < yTl ]πlN (y|µy|x,l,Λy|x,l)
with µy|x,l=µy+σ2xy
σ2yy
(x∗−µx);Λy|x,l = σ2yy−σ4xy
σ2xx
The partition function: Zt,x∗ =∑l∈Lt,x∗
πl(φt,l(yTl |x = x∗)− φt,l(yBl |x = x∗))
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 52: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/52.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Density Forests – regressing non-functional relations
Multi-modality captured!
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 53: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/53.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Manifold Learning
The Problem:
• Given a set of k unlabelled observations {v1,v2, . . . ,vk} with vi ∈ Rd,
find a smooth mapping f : Rd → Rd′ with d′ << d and preserves theobservations’ relative distances.
The Solution: Manifold forests
• Manifold forests build on density forests
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 54: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/54.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Manifold Forests – for non-linear dimensionality reduction
Recall: Each tree in a density forest defines a clustering of theinput pointsThe affinity model: For each clustering tree t we can compute anassociation matrix W t
ij = exp(−Dt(vi,vj)), with the distance D:Mahalanobis
Dt(vi,vj) =
{d>ij(Λ
tl(vi)
)−1dij if l(vi) = l(vj)
∞ otherwise; dij = vi − vj
Gaussian
Dt(vi,vj) =
{d>ijdij
ε2if l(vi) = l(vj)
∞ otherwise
Binary
Dt(vi,vj) =
{0 if l(vi) = l(vj)∞ otherwise
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 55: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/55.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Manifold Forests – for non-linear dimensionality reduction
The ensemble model: The affinity matrix for the entire forest is
W =1
T
T∑t=1
W t
The mapping function: Laplacian eigen-maps
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 56: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/56.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Manifold Forests – effect of model parameters
Effect of forest size T
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 57: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/57.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Manifold Forests – effect of model parameters
Effect of affinity model D(·, ·)
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 58: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/58.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semi-supervised Learning
The Problem:• Given a set of both labelled and unlabelled observations,
associate a class label to all unlabelled data
The Solution: Semi-supervised forests• SS forest is a collection of trees that have been trained based
on a mixed information gain function with two components, asupervised and an unsupervised
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 59: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/59.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semi-supervised Forests
Forest training: via mixed information gains I = Iu + αIs Transduction: bylabel propagation on each tree (assigning class labels to already availableunlabelled data points)
c(vu) = c
(argminvl∈L
D(vu,vl)
); ∀vu ∈ U
with geodesic distance
D(vu,vl) = minΓ∈{Γ}
L(Γ)−1∑i=0
d(si, si+1)
and local (Mahalanobis) distances
d(si, si+1) =1
2(d>ijΛ
−1l(vi)
dij + d>ijΛ−1l(vj)dij); dij = si − sj
n.b. averaging over the forest yields probabilistic transduction
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 60: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/60.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semi-supervised Forests
Induction: wanting a generic classification function for previouslyunseen test points• each semi-supervised tree and newly labelled data points
defines a class posterior pt(c|v)• the forest class posterior is just the linear ensemble
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 61: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/61.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semi-supervised Forests – effect of model parameters
Effect of forest size T
• S-shaped decision boundary
• greater uncertainty further from the labelled data
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 62: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/62.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Semi-supervised Forests – comparisons with SVM andTSVM
bottom: more noise in locations of unlabelled data
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 63: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/63.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
Despite growing interest and practical use, there hasbeen little exploration of the statistical properties of
random forests, and little is known about themathematical forces driving the algorithm, (Biau, 2012)
Most theoretical studies have concentrated on isolated parts orstylized versions of the algorithm:
• Breiman, Consistency for a simple model of random forests,Tech. Report, 2004
• Biau, Devroye, and Lugosi, Consistency of Random Forestsand Other Averaging Classifiers, JMLR, 2008
• Biau, Analysis of a Random Forests Model, JMLR, 2012
• Denil, Matheson, and de Freitas, Consistency of OnlineRandom Forests, ICML, 2013
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 64: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/64.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
• Let gn be a binary-valued function from n data points
• As n varies, we obtain a sequence of classifiers {gn}• The sequence {gn} is consistent (i.e. the probability of error of gt
converges in probability to the Bayes risk) if
L(gn) = Pr(gn(X,Z) 6= Y |Dn)→ L∗, as t→∞
Main results of Biau, Devroye, and Lugosi, 2008
• averaged classifiers are consistent whenever the base classifiers are →Denil, Matheson, and de Freitas, 2013 built on this
• The tool: connections to locally weighted average classifiers
• purely random forest (data independent) is consistent
• bagging preserves consistency of the base rule and it may even createconsistent rules from inconsistent ones
• some greedily grown random forests, including Breiman’s random forestare inconsistent
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 65: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/65.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
Main results of Biau, 2012
• Biau, 2012 achieves the closest match between theory andpractice, with just one caveat: it requires a second data setwhich is not used to fit the leaf predictors in order to makedecisions about variable importance when growing the trees
• greedily grown random forests with extra samples is consistentand the rate of convergence depends only on the number ofstrong variables and not on the dimension of the ambientspace
• The variance of the forest is of the order kn/(n(log kn)S/2d).Letting kn = n, the variance is of the order 1/(log n)S/2d,that still goes to 0 as n grows. Insight on why random forestsare still able to do a good job, despite the fact that individualtrees are not pruned.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 66: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/66.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
Main results of Biau, 2012
• Biau, 2012 achieves the closest match between theory andpractice, with just one caveat: it requires a second data setwhich is not used to fit the leaf predictors in order to makedecisions about variable importance when growing the trees
• greedily grown random forests with extra samples is consistentand the rate of convergence depends only on the number ofstrong variables and not on the dimension of the ambientspace
• The variance of the forest is of the order kn/(n(log kn)S/2d).Letting kn = n, the variance is of the order 1/(log n)S/2d,that still goes to 0 as n grows. Insight on why random forestsare still able to do a good job, despite the fact that individualtrees are not pruned.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 67: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/67.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
Main results of Biau, 2012
• Biau, 2012 achieves the closest match between theory andpractice, with just one caveat: it requires a second data setwhich is not used to fit the leaf predictors in order to makedecisions about variable importance when growing the trees
• greedily grown random forests with extra samples is consistentand the rate of convergence depends only on the number ofstrong variables and not on the dimension of the ambientspace
• The variance of the forest is of the order kn/(n(log kn)S/2d).Letting kn = n, the variance is of the order 1/(log n)S/2d,that still goes to 0 as n grows. Insight on why random forestsare still able to do a good job, despite the fact that individualtrees are not pruned.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 68: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/68.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
From theory back to practise (Biau, 2012)
Observation: as n grows, the probability of cuts does concentrateon the informative dimensions only.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 69: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/69.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Consistency Results
From theory back to practise (Biau, 2012)
Observation: the overall performance of the alternative method(via extra sample) is very similar to the original Breiman’s method.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 70: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/70.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
What else to do with random forests?
• forests for structured prediction (S. Nowozin, et al.,Decisiontree fields, ICCV, 2011)
I Structured problem:
I Forests + Field model: I Some results:
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 71: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/71.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
What else to do with random forests?
• Bayesian random forests (shameless plug: N. Quadrianto andZ. Ghahramani, A very simple Bayesian random
forest)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
height = 161
1
2
3456789101112131415161718192021222324252627
282930
31
3233343536
3738394041424344454647
48495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144
145
146
147
148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
181182183
184185186187188189190191192193194195196197198199200201202
203204205
206
207
208209210
211
212
213214
215
216
217
218219220221222223
224225226
227228
229230231
232233
234235236237238239240241242243244245246247248249250251252
253254
255
256
257
258259260
261262263
264
265266267268269270271272273
274
275276277
278
279
280281
282
283284
285286287288289
290291
292
293294295296
297
298
299
300301
302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334
335336337
338
339
340
341
342
343344345346347
348
349
350
351
352353354
355356357358359360361362363364365366367368369370371372373374375376377378379380381382
383
384385386
387
388389390391392393394395396397398399400401402
403
404
405
406
407
408409410411412
413414415416417418419
420421422423424
425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473
474
475
476
477
478479480481482
483484485
486487
488489490
491492493494
495496497498499500501502503504505506507508509510511
512513514515
516
517518
519
520521
522523524
525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674
675
676677678679680681682683684685686687688689690
691
692
693
694695696
697698699
700
701
702703704705
706707708709710711712713714
715
716717718719720721722723724725726
727728729730731
732
733
734735736
737738
739
740
741
742743744745746747748749750751752753754755756757758759760761
762763
764
765
766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847
848849
850851852
853
854855856
857858
859
860
861
862863
864865866
867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944
945
946947948949950951952953954955956
957958959960
961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991
992993994
995
996
997
9989991000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013101410151016101710181019102010211022102310241025102610271028102910301031
1032
10331034103510361037103810391040104110421043104410451046104710481049
1050105110521053
10541055105610571058
10591060106110621063
1064
1065
1066
1067
1068
10691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139
1140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194
11951196
1197
1198
119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237
1238123912401241
1242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284
12851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322
1323
1324
1325
13261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390
139113921393
1394
139513961397
139813991400
1401
14021403140414051406
1407
14081409141014111412
1413
141414151416
1417
14181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440
14411442
1443144414451446144714481449
14501451
1452
1453
1454
14551456
1457
14581459
14601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494
1495
1496
1497
1498149915001501150215031504150515061507150815091510
1511
151215131514151515161517151815191520152115221523152415251526
15271528
1529
1530
15311532153315341535
1536153715381539154015411542
15431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577
1578157915801581
15821583
158415851586158715881589159015911592159315941595159615971598
15991600
1601
16021603
1604
160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724
1725
1726
17271728
1729
173017311732
1733
173417351736
1737
17381739
1740
1741174217431744174517461747
1748
17491750
1751
1752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320242025202620272028202920302031203220332034203520362037203820392040204120422043204420452046
20472048
2049
2050
20512052205320542055205620572058205920602061206220632064206520662067206820692070207120722073207420752076207720782079208020812082
20832084
20852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116
2117
2118
2119
21202121212221232124212521262127212821292130213121322133213421352136213721382139214021412142214321442145214621472148214921502151215221532154215521562157215821592160216121622163216421652166216721682169217021712172217321742175217621772178217921802181218221832184218521862187218821892190219121922193219421952196219721982199220022012202220322042205220622072208220922102211221222132214221522162217221822192220222122222223222422252226222722282229223022312232223322342235223622372238223922402241224222432244224522462247
224822492250
22512252225322542255
2256
2257
22582259
2260
226122622263
2264
2265
226622672268
2269
2270
2271
227222732274
2275
2276227722782279228022812282228322842285228622872288228922902291229222932294229522962297229822992300230123022303230423052306
2307
2308
2309
2310231123122313231423152316231723182319232023212322232323242325232623272328232923302331233223332334233523362337233823392340234123422343234423452346234723482349235023512352
2353
2354
2355
2356
23572358
2359
23602361
2362236323642365236623672368
23692370
237123722373237423752376237723782379
2380
23812382238323842385238623872388238923902391239223932394239523962397
2398
239924002401240224032404240524062407
2408
2409
24102411
2412
24132414
2415
2416
24172418
2419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443
24442445
2446
244724482449
2450
24512452
2453
2454
2455
2456
24572458
245924602461246224632464246524662467
2468
2469
2470
2471247224732474247524762477247824792480248124822483248424852486248724882489249024912492249324942495249624972498249925002501250225032504250525062507250825092510251125122513251425152516251725182519252025212522252325242525252625272528252925302531
25322533
2534
25352536
253725382539
254025412542
254325442545
25462547254825492550
25512552255325542555
2556
255725582559
2560
2561256225632564256525662567256825692570257125722573
257425752576
2577257825792580258125822583258425852586258725882589259025912592259325942595259625972598259926002601260226032604260526062607
2608260926102611
2612
2613
2614
26152616
2617
261826192620262126222623262426252626
2627
26282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713
2714
2715
27162717
2718
271927202721
2722
2723
2724
2725
2726
272727282729273027312732273327342735273627372738273927402741274227432744274527462747274827492750275127522753275427552756275727582759276027612762276327642765276627672768276927702771277227732774277527762777277827792780278127822783278427852786278727882789279027912792279327942795279627972798279928002801280228032804280528062807280828092810281128122813281428152816281728182819282028212822282328242825282628272828282928302831283228332834
28352836
2837283828392840284128422843284428452846284728482849
2850
28512852285328542855
2856
2857285828592860286128622863286428652866
2867
2868
2869
287028712872
2873
287428752876287728782879288028812882288328842885288628872888288928902891289228932894289528962897289828992900290129022903290429052906290729082909291029112912291329142915
291629172918
29192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953
2954
2955
29562957
2958
29592960
2961296229632964296529662967
29682969297029712972297329742975297629772978297929802981298229832984298529862987298829892990
2991
2992
29932994
2995
299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271
327232733274
3275327632773278327932803281328232833284328532863287
32883289329032913292329332943295329632973298329933003301330233033304330533063307330833093310331133123313331433153316331733183319332033213322332333243325332633273328
3329333033313332
33333334333533363337333833393340334133423343
3344334533463347334833493350335133523353335433553356335733583359336033613362336333643365336633673368336933703371337233733374337533763377337833793380338133823383338433853386338733883389339033913392339333943395339633973398339934003401
3402
34033404340534063407340834093410341134123413341434153416341734183419342034213422342334243425342634273428342934303431343234333434343534363437343834393440344134423443
34443445
3446
3447
3448
344934503451345234533454345534563457345834593460346134623463346434653466346734683469347034713472347334743475347634773478347934803481348234833484348534863487348834893490349134923493349434953496349734983499350035013502350335043505350635073508350935103511351235133514351535163517351835193520352135223523352435253526
3527
3528
3529
3530353135323533353435353536353735383539354035413542354335443545354635473548
3549
35503551355235533554
3555
3556355735583559356035613562
3563
3564
3565
3566
35673568356935703571
357235733574357535763577357835793580358135823583358435853586358735883589359035913592359335943595359635973598
3599360036013602360336043605360636073608
3609361036113612361336143615
36163617
3618
36193620
3621
3622
362336243625362636273628362936303631363236333634363536363637363836393640364136423643364436453646364736483649365036513652365336543655365636573658365936603661366236633664366536663667366836693670
3671
3672367336743675367636773678
367936803681
3682
36833684
368536863687368836893690369136923693
3694
3695
36963697
3698369937003701370237033704
37053706
3707
3708
3709371037113712371337143715371637173718371937203721372237233724372537263727372837293730373137323733373437353736373737383739374037413742374337443745374637473748374937503751375237533754375537563757375837593760376137623763376437653766376737683769377037713772377337743775377637773778377937803781378237833784378537863787378837893790
3791
3792
3793
3794
379537963797
3798
3799
3800
3801
3802
3803
38043805380638073808
38093810381138123813381438153816381738183819382038213822382338243825382638273828382938303831383238333834383538363837383838393840384138423843384438453846
3847
384838493850
3851
3852
38533854
38553856385738583859
3860
3861
38623863
3864386538663867386838693870387138723873387438753876387738783879388038813882388338843885388638873888388938903891389238933894389538963897389838993900390139023903390439053906390739083909391039113912391339143915391639173918391939203921392239233924392539263927392839293930393139323933393439353936393739383939394039413942394339443945394639473948394939503951395239533954395539563957395839593960396139623963396439653966396739683969397039713972397339743975397639773978397939803981398239833984398539863987398839893990399139923993399439953996399739983999400040014002400340044005
4006
400740084009401040114012401340144015
4016
4017
4018
4019
4020
4021
4022
4023
40244025402640274028
40294030
4031
4032
4033
403440354036
40374038403940404041404240434044404540464047
4048
40494050405140524053
40544055
4056
405740584059
4060
4061406240634064406540664067406840694070407140724073407440754076407740784079408040814082408340844085408640874088
408940904091409240934094409540964097409840994100410141024103410441054106410741084109411041114112411341144115411641174118411941204121412241234124412541264127412841294130413141324133413441354136413741384139414041414142414341444145414641474148414941504151415241534154415541564157415841594160416141624163416441654166416741684169
4170
41714172417341744175417641774178417941804181418241834184418541864187418841894190419141924193419441954196419741984199420042014202420342044205420642074208420942104211421242134214421542164217421842194220422142224223
4224
4225
42264227
4228
42294230423142324233
42344235423642374238
42394240
424142424243424442454246424742484249
42504251
4252
4253
4254
4255
4256
42574258
4259
4260
4261426242634264426542664267
42684269427042714272
4273
4274
4275
427642774278
4279428042814282428342844285
4286
4287
4288
4289
429042914292
429342944295
4296
429742984299
4300
43014302430343044305
43064307
430843094310
431143124313
431443154316431743184319432043214322432343244325432643274328432943304331433243334334433543364337433843394340434143424343434443454346434743484349435043514352435343544355435643574358435943604361436243634364436543664367436843694370437143724373437443754376437743784379
4380
4381
4382
438343844385438643874388438943904391439243934394439543964397439843994400440144024403440444054406440744084409441044114412441344144415441644174418441944204421442244234424442544264427442844294430443144324433443444354436443744384439444044414442444344444445444644474448444944504451445244534454445544564457445844594460446144624463446444654466446744684469447044714472447344744475447644774478447944804481448244834484448544864487
448844894490
4491
4492449344944495
449644974498
4499
4500
45014502
4503
45044505450645074508
4509
45104511
451245134514451545164517451845194520452145224523452445254526452745284529453045314532453345344535453645374538453945404541454245434544454545464547454845494550455145524553455445554556455745584559456045614562456345644565456645674568
4569
4570
4571
457245734574457545764577457845794580458145824583458445854586458745884589459045914592459345944595459645974598459946004601460246034604460546064607460846094610461146124613461446154616461746184619
46204621
462246234624
4625
4626
4627
4628
4629
463046314632
4633
4634
4635
4636
4637
4638
46394640464146424643464446454646464746484649465046514652465346544655465646574658465946604661466246634664466546664667466846694670467146724673467446754676467746784679468046814682468346844685468646874688468946904691469246934694469546964697469846994700470147024703470447054706470747084709471047114712471347144715471647174718471947204721472247234724472547264727
47284729
4730
47314732
47334734473547364737473847394740474147424743474447454746474747484749
475047514752
47534754475547564757
47584759476047614762476347644765476647674768476947704771477247734774477547764777
4778477947804781478247834784478547864787
478847894790479147924793479447954796479747984799480048014802480348044805480648074808480948104811481248134814481548164817481848194820482148224823482448254826482748284829483048314832483348344835483648374838483948404841484248434844484548464847484848494850485148524853485448554856485748584859486048614862486348644865486648674868486948704871487248734874487548764877487848794880488148824883488448854886488748884889
4890
4891489248934894489548964897489848994900490149024903490449054906490749084909491049114912491349144915491649174918491949204921492249234924492549264927492849294930493149324933493449354936493749384939494049414942494349444945494649474948494949504951495249534954495549564957495849594960
4961
4962
4963
49644965
4966
4967
49684969497049714972497349744975497649774978497949804981498249834984498549864987498849894990499149924993499449954996499749984999500050015002500350045005500650075008
50095010501150125013
5014
5015
5016
5017501850195020502150225023
5024
50255026
5027502850295030503150325033503450355036503750385039
50405041504250435044504550465047504850495050505150525053505450555056505750585059506050615062506350645065506650675068
50695070
5071
50725073
5074
50755076507750785079508050815082508350845085508650875088
5089
5090
5091509250935094509550965097509850995100510151025103510451055106510751085109511051115112511351145115511651175118511951205121512251235124512551265127512851295130513151325133513451355136513751385139514051415142514351445145514651475148514951505151515251535154515551565157515851595160516151625163516451655166516751685169517051715172517351745175517651775178517951805181518251835184518551865187518851895190519151925193519451955196519751985199520052015202520352045205520652075208520952105211
Tree Samples with λ = 0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
height = 14
12
3
45
6
7
8
9
10
11121314
151617
18
19
20212223
242526
272829303132
3334
35
36373839
40414243
444546
47
48
4950515253545556
57
58
59
606162
63
64
6566
67686970
7172737475
Tree Samples with λ = 0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
height = 8
1
2
3
4
5
6 7
8
9
10
11
12
13
14 1516
17
18
19
2021 22
23
24
25
26
27
28
29
30 3132 33
Tree Samples with λ = 0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
height = 3
1
2
3 4
5
6
7
8 9
Tree Samples with λ = 0.5
• forests for ranking (S. Clemencon, M. Depecker, N. Vayatis,Ranking forests, JMLR, 2013)
• . . .
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems
![Page 72: Random Forests: One Tool for All Your Problemscbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/RCC_July_2013.pdf · Random Forests: One Tool for All Your Problems Neil Houlsby and](https://reader034.fdocuments.us/reader034/viewer/2022042612/5f7d57e4fe3bc16bd50599ef/html5/thumbnails/72.jpg)
Motivations Generic Model Supervised Learning (Un & Semi)-supervised Learning Statistical Properties Future Directions
Thank You
Thank you for your attention.
Neil Houlsby and Novi Quadrianto Department of Engineering, University of Cambridge
Random Forests: One Tool for All Your Problems