A walk in random forests - unistra.frirma.math.unistra.fr/~gardes/SEMINAIRE/scornet.pdf ·...
Transcript of A walk in random forests - unistra.frirma.math.unistra.fr/~gardes/SEMINAIRE/scornet.pdf ·...
A walk in random forests
Erwan Scornet (LSTA, Institut Curie),Supervised by Gerard Biau (LSTA)
and Jean-Philippe Vert (Institut Curie)
Seminaire Statistiques - IRMAStrasbourg, October 2015
Erwan Scornet Random forests
Background on random forests
Random forests are a class of algorithms used to solve regression and classificationproblems
They are often used in applied fields since they handle high-dimensionalsettings.
They have good predictive power and can outperform state-of-the-art meth-ods.
Erwan Scornet Random forests
Background on random forests
But theoretical results are not yet entirely sufficient to explain their goodaccuracy.
Erwan Scornet Random forests
1 Construction of random forests
2 Random forests and kernel methods
3 Consistency of Breiman forests
Erwan Scornet Random forests
General framework of the presentation
Regression setting
We are given a training set Dn = {(X1,Y1), ..., (Xn,Yn)} where the pairs(Xi ,Yi ) ∈ [0, 1]d × R are i .i .d . distributed as (X ,Y ).
We assume that
Y = m(X) + ε,
where ε ∼ N (0, σ2). We want to build an estimate of the regressionfunction m using random forest algorithm.
Erwan Scornet Random forests
How to build a tree?
Breiman Random forests are defined by
1 A splitting rule : minimize the square loss.
2 A stopping rule : leave exactly one point in each cell.
Erwan Scornet Random forests
How to perform splits of Breiman’s forests?
For a cut direction j ∈ {1, . . . , d} and a split position z ∈ [0, 1] , thecriterion takes the form
Ln(j , z) =1
Nn(A)
n∑i=1
(Yi − YAL
1X
(j)i <z− YAR
1X
(j)i ≥z
)2
,
where
AL = {x ∈ A : x(j) < z} and AR = {x ∈ A : x(j) ≥ z}YA is the average of the Yi ’s belonging to A.
Nn(A) is the number of points in A
Erwan Scornet Random forests
How to perform splits of Breiman’s forests?
An example: j = 1 and z = 0.5.
16,2
14,8
17,1
5,8
16,2
7,1
6,2
5,7
5,5
Erwan Scornet Random forests
How to perform splits of Breiman’s forests?
An example: j = 1 and z = 0.5.
16,2
14,8
17,1
5,8
16,2
7,1
6,2
5,7
5,5
Ln(1, 0.5) =1
Nn(A)
n∑i=1
(Yi − YAL
1X
(1)i <0.5︸ ︷︷ ︸
Average on AL
− YAR1X
(1)i ≥0.5
)2
,
Erwan Scornet Random forests
How to perform splits of Breiman’s forests?
An example: j = 1 and z = 0.5.
16,2
14,8
17,1
5,8
16,2
7,1
6,2
5,7
5,5
Ln(1, 0.5) =1
Nn(A)
n∑i=1
(Yi − YAL
1X
(1)i <0.5
− YAR1X
(1)i ≥0.5︸ ︷︷ ︸
Average on AR
)2
,
Erwan Scornet Random forests
Construction of random forests
Randomness in tree construction
resample the data set via bootstrap;
At each node, preselect a subset of mtry variables eligible forsplitting.
Erwan Scornet Random forests
Literature
Random forests were created by Breiman [2001].
Many extentions have been proposed to
solve ranking problems [Clemencon et al., 2013],solve survival analysis problems [Ishwaran et al., 2008],perform quantile estimation [Meinshausen, 2006],
and to improve calculation time [Geurts et al., 2006].
Many theoretical results focus on simplified version on random forests,whose construction is independent of the dataset[Biau et al., 2008, Ishwaran and Kogalur, 2010, Biau, 2012, Genuer,2012, Zhu et al., 2012].
Asymptotic normality of random forests [Mentch and Hooker, 2014,Wager, 2014].
Erwan Scornet Random forests
Random prediction or not?
Tree estimate:
mn(x,Θ) =n∑
i=1
1Xi∈An(x,Θ)
Nn(x,Θ)Yi
where Nn(x,Θ) is the number of points in the cell An(x,Θ).
Erwan Scornet Random forests
Random prediction or not?
Tree estimate:
mn(x,Θ) =n∑
i=1
1Xi∈An(x,Θ)
Nn(x,Θ)Yi
where Nn(x,Θ) is the number of points in the cell An(x,Θ).
M-Finite forest estimate :
mM,n(x,Θ1, . . . ,ΘM) =1
M
M∑m=1
mn(x,Θm)
Erwan Scornet Random forests
Random prediction or not?
Tree estimate:
mn(x,Θ) =n∑
i=1
1Xi∈An(x,Θ)
Nn(x,Θ)Yi
where Nn(x,Θ) is the number of points in the cell An(x,Θ).
M-Finite forest estimate :
mM,n(x,Θ1, . . . ,ΘM) =1
M
M∑m=1
mn(x,Θm)
Conditionally on Dn, the estimate mM,n depends on Θ1, . . . ,ΘM .
Erwan Scornet Random forests
Random prediction or not?
Tree estimate:
mn(x,Θ) =n∑
i=1
1Xi∈An(x,Θ)
Nn(x,Θ)Yi
where Nn(x,Θ) is the number of points in the cell An(x,Θ).
M-Finite forest estimate :
mM,n(x,Θ1, . . . ,ΘM) =1
M
M∑m=1
mn(x,Θm) →M→∞
EΘ [mn(x,Θ)]︸ ︷︷ ︸m∞,n(x)
Erwan Scornet Random forests
1 Construction of random forests
2 Random forests and kernel methods
3 Consistency of Breiman forests
Erwan Scornet Random forests
Theoretical difficulties for studying random forests
The infinite random forests estimate takes the form
m∞,n(x) =n∑
i=1
YiEΘ
[1Xi∈An(x,Θ)
Nn(x,Θ)
],
where
Nn(x,Θ) is the number of points in the cell An(x,Θ).
Two different difficulties:
The number of points in each cell is unknown.
The tree dependency on the random variable Θ is unknown.
Erwan Scornet Random forests
Theoretical difficulties for studying random forests
The infinite random forests estimate takes the form
m∞,n(x) =n∑
i=1
YiEΘ
[1Xi∈An(x,Θ)
Nn(x,Θ)
],
where
Nn(x,Θ) is the number of points in the cell An(x,Θ).
Two different difficulties:
The number of points in each cell is unknown.
The tree dependency on the random variable Θ is unknown.
Erwan Scornet Random forests
Kernel based on Random Forests (KeRF)
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
Erwan Scornet Random forests
Kernel based on Random Forests (KeRF)
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
Erwan Scornet Random forests
Kernel based on Random Forests (KeRF)
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
5,56,2
6,8
5,3
6,0
15,1
16,2
14,8
17,118
5,8
5,8
16,2
16,2
7,1
6,25,7
5,5
Infinite KeRF estimate:
m∞,n(x) =
∑ni=1 YiKk(x,Xi )∑nj=1 Kk(x,Xj)
,
where Kk(x,Xi ) = PΘ [Xi ∈ An(x,Θ)].
Erwan Scornet Random forests
Breiman KeRF vs Breiman random forests
n = 800, d = 50 n = 600, d = 100
Y = X 21 + exp(−X 2
2 ) Y = − sin(2X1) + X 22 + X3
− exp(−X4) +N (0, 0.5)
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
p=1/2
p=1/2
p=1/2
p=1/2
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
A simple model: the centred forest
Erwan Scornet Random forests
Centred KeRF vs centred random forests
n = 800, d = 50 n = 600, d = 100
Y = X 21 + exp(−X 2
2 ) Y = − sin(2X1) + X 22 + X3
− exp(−X4) +N (0, 0.5)
Erwan Scornet Random forests
Uniform KeRF vs uniform random forests
n = 800, d = 50 n = 600, d = 100
Y = X 21 + exp(−X 2
2 ) Y = − sin(2X1) + X 22 + X3
− exp(−X4) +N (0, 0.5)
Erwan Scornet Random forests
Analyzing KeRF estimates
Infinite KeRF estimate: m∞,n(x) =∑n
i=1 YiKk (x,Xi )∑nj=1 Kk (x,Xj )
Local averaging estimate and thus easier to analyze.
One common assumption on kernel estimate is that Kk(x, z) = K ( x−zk )
which is not the case here. Thus, standard methods to deal with ker-nel estimate cannot be directly adapted to our case.
Generally, Kk(x,Xi ) cannot be explicited (due to the complexity ofpartitioning). But it can be computed for centred/uniform randomforests.
Erwan Scornet Random forests
Centred forests
For all x, z ∈ [0, 1]d ,
K cck (x, z) =
∑k1,...,kd∑dj=1 kj=k
k!
k1! . . . kd !
(1
d
)k d∏m=1
1d2km xme=d2km zme.
Representations of z 7→ K cck ((0.5, 0.5), z) for k = 1, 2, 5
Erwan Scornet Random forests
Uniform forests
For all z ∈ [0, 1]d ,
K ufk (0, z) =
∑k1,...,kd∑dj=1 kj=k
k!
k1! . . . kd !
(1
d
)k d∏m=1
zm
∞∑j=km
(− log zm)j
j!.
Representations of z 7→ K ufk
(0, (z1 − 0.5, z2 − 0.5)
)for k = 1, 2, 5
Erwan Scornet Random forests
Rate of consistency of KeRF
Centred KeRF
Assume that m is Lipschitz. Then, provided 2k/n→ 0, and k →∞,
E[mcc∞,n(X)−m(X)
]2 ≤ C1n−1/(3+d log 2)(log n)2.
Erwan Scornet Random forests
Rate of consistency of KeRF
Centred KeRF
Assume that m is Lipschitz. Then, provided 2k/n→ 0, and k →∞,
E[mcc∞,n(X)−m(X)
]2 ≤ C1n−1/(3+d log 2)(log n)2.
Uniform KeRF
Assume that m is Lipschitz. Then, provided 2k/n→ 0, and k →∞
E[muf∞,n(X)−m(X)
]2 ≤ Cn−1/(3+1.5d log 2)(log n)2.
Erwan Scornet Random forests
Rate of consistency of KeRF
Centred KeRF
Assume that m is Lipschitz. Then, provided 2k/n→ 0, and k →∞,
E[mcc∞,n(X)−m(X)
]2 ≤ C1n−1/(3+d log 2)(log n)2.
Uniform KeRF
Assume that m is Lipschitz. Then, provided 2k/n→ 0, and k →∞
E[muf∞,n(X)−m(X)
]2 ≤ Cn−1/(3+1.5d log 2)(log n)2.
Minimax rate for Lipschitz functions: n−1
1+0.5d
Erwan Scornet Random forests
Summary of KeRF
Pros
KeRF and random forests are close in terms of accuracy.
KeRF estimates are more amenable to analysis, since they are kernelestimates.
The weighted function Kk is related to the shape of the partitions.
Cons
Computing the infinite kernel Kk is time consuming.
Breiman KeRF is difficult to express since the kernel K depends onthe data set.
Erwan Scornet Random forests
1 Construction of random forests
2 Random forests and kernel methods
3 Consistency of Breiman forests
Erwan Scornet Random forests
Tree consistency
For a tree whose construction is independent of data, if
1 diam(An(X))→ 0, in probability;
2 Nn(An(X))→∞, in probability;
then the tree is consistent, that is
limn→∞
E |mn(X)−m(X)|2 = 0.
Erwan Scornet Random forests
Consistency of centred random forest
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2 2kn
nk1/2n
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL2.2−0.75knd log 2 + ‖m‖2
∞e−n/2kn
Erwan Scornet Random forests
Consistency of centred random forest
If the forest is fully grown, that is, if kn = blog2 nc
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2 2kn
nk1/2n
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL2.2−0.75knd log 2 + ‖m‖2
∞e−n/2kn
Erwan Scornet Random forests
Consistency of centred random forest
If the forest is fully grown, that is, if kn = blog2 nc
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2 2kn
nk1/2n
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL2.2−0.75knd log 2 + ‖m‖2
∞e−n/2kn
Erwan Scornet Random forests
Consistency of centred random forest
If the forest is fully grown, that is, if kn = blog2 nc
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2(log2 n)−1/2
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL2.2−0.75knd log 2 + ‖m‖2
∞e−n/2kn
Erwan Scornet Random forests
Consistency of centred random forest
If the forest is fully grown, that is, if kn = blog2 nc
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2(log2 n)−1/2
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL22−0.75knd log 2 + ‖m‖2
∞e−n/2kn
Erwan Scornet Random forests
Consistency of centred random forest
If the forest is fully grown, that is, if kn = blog2 nc
Estimation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)− mcc
∞,n(X)]2 ≤ Cσ2(log2 n)−1/2
Approximation error [Biau, 2012]
Under proper assumptions on the regression model,
E[mcc∞,n(X)−m(X)
]2 ≤ 2dL2n−0.75d log 2 + ‖m‖2
∞×1
Erwan Scornet Random forests
Algorithm for Breiman random forest
Randomness for Breiman random forests
Data sampling : bootstrap
At each cell, select randomly mtry coordinates among {1, . . . , d}.
Choose the split by minimizing the CART-split criterion on the cellalong the mtry selected coordinates.
Stop when each cell contains exactly one point.
Erwan Scornet Random forests
Algorithm for Breiman random forest
Randomness for Breiman random forests
Data sampling : subsampling, that is choosing an points among nwith an < n
At each cell, select randomly mtry coordinates among {1, . . . , d}.
Choose the split by minimizing the CART-split criterion on the cellalong the mtry selected coordinates.
Stop when the number of cell is exactly tn.
Erwan Scornet Random forests
Assumption (H1)
Additive regression model:
Y =d∑
i=1
mi (X(i)) + ε,
where
X is uniformly distributed on [0, 1]d ,
ε ∼ N (0, σ2) with ε independent of X,
Each model component mi is continuous.
Erwan Scornet Random forests
Consistency
Theorem [S. et al., 2014]
Assume that (H1) is satisfied. Then, provided an → ∞ andtn(log an)9/an → 0, random forests are consistent, i.e.,
limn→∞
E [m∞,n(X)−m(X)]2 = 0.
Remarks
First consistency result for Breiman’s original forest.
Consistency of CART.
Erwan Scornet Random forests
Sparsity and random forests
Assume that
Y =S∑
i=1
mi (X(i)) + ε,
for some S < d .
Denote by j1,n(X), . . . , jk,n(X) the first k cut directions used toconstruct the cell containing X.
Proposition [S. et al., 2014]
Let k ∈ N? and ξ > 0. Under appropriate assumptions, with probability1− ξ, for all n large enough, we have, for all 1 ≤ q ≤ k,
jq,n(X) ∈ {1, . . . ,S}.
Erwan Scornet Random forests
G. Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13:1063–1095,2012.
G. Biau, L. Devroye, and G. Lugosi. Consistency of random forests and other averaging classifiers.Journal of Machine Learning Research, 9:2015–2033, 2008.
L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
S. Clemencon, M. Depecker, and N. Vayatis. Ranking forests. Journal of machine learning research,14(1):39–73, 2013.
R. Genuer. Variance reduction in purely random forests. Journal of Nonparametric Statistics, 24:543–562, 2012.
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Springer science, Mars 2006.
H. Ishwaran and U. Kogalur. Consistency of random survival forests. Statistics & Probability Letters,80:1056–1064, 2010.
H. Ishwaran, U. Kogalur, E. Blackstone, and M. Lauer. Random survival forest. The annals ofapplied statistics, 2(3):841–860, 2008.
N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7:983–999,2006.
L. Mentch and G. Hooker. Ensemble trees and clts: Statistical inference for supervised learning.arXiv:1404.6473, 2014.
S., Gerard Biau, and Jean-Philippe Vert. Consistency of random forests. arXiv:1405.2881, 2014.
S. Wager. Asymptotic theory for random forests. arXiv:1405.0352, 2014.
R. Zhu, D. Zeng, and M.R. Kosorok. Reinforcement learning trees. 2012.
Erwan Scornet Random forests
Merci pour votre attention !
Erwan Scornet Random forests