Présentation G.Biau Random Forests

181
Random forests Gérard Biau Groupe de travail prévision, May 2011 G. Biau (UPMC) 1 / 106

description

Introduction aux random forestscas sparse

Transcript of Présentation G.Biau Random Forests

Page 1: Présentation G.Biau Random Forests

Random forests

Gérard Biau

Groupe de travail prévision, May 2011

G. Biau (UPMC) 1 / 106

Page 2: Présentation G.Biau Random Forests

Outline

1 Setting

2 A random forests model

3 Layered nearest neighbors and random forests

G. Biau (UPMC) 2 / 106

Page 3: Présentation G.Biau Random Forests

Outline

1 Setting

2 A random forests model

3 Layered nearest neighbors and random forests

G. Biau (UPMC) 3 / 106

Page 4: Présentation G.Biau Random Forests

Leo Breiman (1928-2005)

G. Biau (UPMC) 4 / 106

Page 5: Présentation G.Biau Random Forests

Leo Breiman (1928-2005)

G. Biau (UPMC) 5 / 106

Page 6: Présentation G.Biau Random Forests

Classification

G. Biau (UPMC) 6 / 106

Page 7: Présentation G.Biau Random Forests

Classification

?

G. Biau (UPMC) 7 / 106

Page 8: Présentation G.Biau Random Forests

Classification

G. Biau (UPMC) 8 / 106

Page 9: Présentation G.Biau Random Forests

Regression

1.09

1.386.76

5.89

−2.33

12.00

0.12

4.7331.2

0.562.27

−2.56

−2.88

7.26

4.32

−1.23

−2.66

7.77

0.122.23

7.72

3.82

8.21

G. Biau (UPMC) 9 / 106

Page 10: Présentation G.Biau Random Forests

Regression

?

1.386.76

5.89

−2.33

12.00

0.12

4.7331.2

0.562.27

−2.56

−2.88

7.26

4.32

−1.23

−2.66

7.77

0.122.23

7.72

3.82

8.21

1.09

G. Biau (UPMC) 10 / 106

Page 11: Présentation G.Biau Random Forests

Regression

2.25

1.386.76

5.89

−2.33

12.00

0.12

4.7331.2

0.562.27

−2.56

−2.88

7.26

4.32

−1.23

−2.66

7.77

0.122.23

7.72

3.82

8.21

1.09

G. Biau (UPMC) 11 / 106

Page 12: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 12 / 106

Page 13: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 13 / 106

Page 14: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 14 / 106

Page 15: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 15 / 106

Page 16: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 16 / 106

Page 17: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 17 / 106

Page 18: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 18 / 106

Page 19: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 19 / 106

Page 20: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 20 / 106

Page 21: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 21 / 106

Page 22: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 22 / 106

Page 23: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 23 / 106

Page 24: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 24 / 106

Page 25: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 25 / 106

Page 26: Présentation G.Biau Random Forests

Trees

G. Biau (UPMC) 26 / 106

Page 27: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 27 / 106

Page 28: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 28 / 106

Page 29: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 29 / 106

Page 30: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 30 / 106

Page 31: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 31 / 106

Page 32: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 32 / 106

Page 33: Présentation G.Biau Random Forests

Trees

?

G. Biau (UPMC) 33 / 106

Page 34: Présentation G.Biau Random Forests

From trees to forests

Leo Breiman promoted random forests.

Idea: Using tree averaging as a means of obtaining good rules.

The base trees are simple and randomized.

Breiman’s ideas were decisively influenced by

Amit and Geman (1997, geometric feature selection).

Ho (1998, random subspace method).

Dietterich (2000, random split selection approach).

stat.berkeley.edu/users/breiman/RandomForests

G. Biau (UPMC) 34 / 106

Page 35: Présentation G.Biau Random Forests

From trees to forests

Leo Breiman promoted random forests.

Idea: Using tree averaging as a means of obtaining good rules.

The base trees are simple and randomized.

Breiman’s ideas were decisively influenced by

Amit and Geman (1997, geometric feature selection).

Ho (1998, random subspace method).

Dietterich (2000, random split selection approach).

stat.berkeley.edu/users/breiman/RandomForests

G. Biau (UPMC) 34 / 106

Page 36: Présentation G.Biau Random Forests

From trees to forests

Leo Breiman promoted random forests.

Idea: Using tree averaging as a means of obtaining good rules.

The base trees are simple and randomized.

Breiman’s ideas were decisively influenced by

Amit and Geman (1997, geometric feature selection).

Ho (1998, random subspace method).

Dietterich (2000, random split selection approach).

stat.berkeley.edu/users/breiman/RandomForests

G. Biau (UPMC) 34 / 106

Page 37: Présentation G.Biau Random Forests

From trees to forests

Leo Breiman promoted random forests.

Idea: Using tree averaging as a means of obtaining good rules.

The base trees are simple and randomized.

Breiman’s ideas were decisively influenced by

Amit and Geman (1997, geometric feature selection).

Ho (1998, random subspace method).

Dietterich (2000, random split selection approach).

stat.berkeley.edu/users/breiman/RandomForests

G. Biau (UPMC) 34 / 106

Page 38: Présentation G.Biau Random Forests

From trees to forests

Leo Breiman promoted random forests.

Idea: Using tree averaging as a means of obtaining good rules.

The base trees are simple and randomized.

Breiman’s ideas were decisively influenced by

Amit and Geman (1997, geometric feature selection).

Ho (1998, random subspace method).

Dietterich (2000, random split selection approach).

stat.berkeley.edu/users/breiman/RandomForests

G. Biau (UPMC) 34 / 106

Page 39: Présentation G.Biau Random Forests

Random forests

They have emerged as serious competitors to state of the artmethods.

They are fast and easy to implement, produce highly accuratepredictions and can handle a very large number of input variableswithout overfitting.

In fact, forests are among the most accurate general-purposelearners available.

The algorithm is difficult to analyze and its mathematicalproperties remain to date largely unknown.

Most theoretical studies have concentrated on isolated parts orstylized versions of the procedure.

G. Biau (UPMC) 35 / 106

Page 40: Présentation G.Biau Random Forests

Random forests

They have emerged as serious competitors to state of the artmethods.

They are fast and easy to implement, produce highly accuratepredictions and can handle a very large number of input variableswithout overfitting.

In fact, forests are among the most accurate general-purposelearners available.

The algorithm is difficult to analyze and its mathematicalproperties remain to date largely unknown.

Most theoretical studies have concentrated on isolated parts orstylized versions of the procedure.

G. Biau (UPMC) 35 / 106

Page 41: Présentation G.Biau Random Forests

Random forests

They have emerged as serious competitors to state of the artmethods.

They are fast and easy to implement, produce highly accuratepredictions and can handle a very large number of input variableswithout overfitting.

In fact, forests are among the most accurate general-purposelearners available.

The algorithm is difficult to analyze and its mathematicalproperties remain to date largely unknown.

Most theoretical studies have concentrated on isolated parts orstylized versions of the procedure.

G. Biau (UPMC) 35 / 106

Page 42: Présentation G.Biau Random Forests

Random forests

They have emerged as serious competitors to state of the artmethods.

They are fast and easy to implement, produce highly accuratepredictions and can handle a very large number of input variableswithout overfitting.

In fact, forests are among the most accurate general-purposelearners available.

The algorithm is difficult to analyze and its mathematicalproperties remain to date largely unknown.

Most theoretical studies have concentrated on isolated parts orstylized versions of the procedure.

G. Biau (UPMC) 35 / 106

Page 43: Présentation G.Biau Random Forests

Random forests

They have emerged as serious competitors to state of the artmethods.

They are fast and easy to implement, produce highly accuratepredictions and can handle a very large number of input variableswithout overfitting.

In fact, forests are among the most accurate general-purposelearners available.

The algorithm is difficult to analyze and its mathematicalproperties remain to date largely unknown.

Most theoretical studies have concentrated on isolated parts orstylized versions of the procedure.

G. Biau (UPMC) 35 / 106

Page 44: Présentation G.Biau Random Forests

Key-references

Breiman (2000, 2001, 2004).

. Definition, experiments and intuitions.

Lin and Jeon (2006).

. Link with layered nearest neighbors.

Biau, Devroye and Lugosi (2008).

. Consistency results for stylized versions.

Biau (2010).

. Sparsity and random forests.

G. Biau (UPMC) 36 / 106

Page 45: Présentation G.Biau Random Forests

Key-references

Breiman (2000, 2001, 2004).

. Definition, experiments and intuitions.

Lin and Jeon (2006).

. Link with layered nearest neighbors.

Biau, Devroye and Lugosi (2008).

. Consistency results for stylized versions.

Biau (2010).

. Sparsity and random forests.

G. Biau (UPMC) 36 / 106

Page 46: Présentation G.Biau Random Forests

Key-references

Breiman (2000, 2001, 2004).

. Definition, experiments and intuitions.

Lin and Jeon (2006).

. Link with layered nearest neighbors.

Biau, Devroye and Lugosi (2008).

. Consistency results for stylized versions.

Biau (2010).

. Sparsity and random forests.

G. Biau (UPMC) 36 / 106

Page 47: Présentation G.Biau Random Forests

Key-references

Breiman (2000, 2001, 2004).

. Definition, experiments and intuitions.

Lin and Jeon (2006).

. Link with layered nearest neighbors.

Biau, Devroye and Lugosi (2008).

. Consistency results for stylized versions.

Biau (2010).

. Sparsity and random forests.

G. Biau (UPMC) 36 / 106

Page 48: Présentation G.Biau Random Forests

Outline

1 Setting

2 A random forests model

3 Layered nearest neighbors and random forests

G. Biau (UPMC) 37 / 106

Page 49: Présentation G.Biau Random Forests

Three basic ingredients

1-Randomization and no-pruning. For each tree, select at random, at each node, a small group of

input coordinates to split.

. Calculate the best split based on these features and cut.

. The tree is grown to maximum size, without pruning.

G. Biau (UPMC) 38 / 106

Page 50: Présentation G.Biau Random Forests

Three basic ingredients

1-Randomization and no-pruning. For each tree, select at random, at each node, a small group of

input coordinates to split.

. Calculate the best split based on these features and cut.

. The tree is grown to maximum size, without pruning.

G. Biau (UPMC) 38 / 106

Page 51: Présentation G.Biau Random Forests

Three basic ingredients

1-Randomization and no-pruning. For each tree, select at random, at each node, a small group of

input coordinates to split.

. Calculate the best split based on these features and cut.

. The tree is grown to maximum size, without pruning.

G. Biau (UPMC) 38 / 106

Page 52: Présentation G.Biau Random Forests

Three basic ingredients

2-Aggregation. Final predictions are obtained by aggregating over the ensemble.

. It is fast and easily parallelizable.

G. Biau (UPMC) 39 / 106

Page 53: Présentation G.Biau Random Forests

Three basic ingredients

2-Aggregation. Final predictions are obtained by aggregating over the ensemble.

. It is fast and easily parallelizable.

G. Biau (UPMC) 39 / 106

Page 54: Présentation G.Biau Random Forests

Three basic ingredients

3-Bagging. The subspace randomization scheme is blended with bagging.

. Breiman (1996).

. Bühlmann and Yu (2002).

. Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 40 / 106

Page 55: Présentation G.Biau Random Forests

Three basic ingredients

3-Bagging. The subspace randomization scheme is blended with bagging.

. Breiman (1996).

. Bühlmann and Yu (2002).

. Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 40 / 106

Page 56: Présentation G.Biau Random Forests

Mathematical framework

A training sample: Dn = {(X1,Y1), . . . , (Xn,Yn)} i.i.d.[0,1]d × R-valued random variables.

A generic pair: (X,Y ) satisfying EY 2 <∞.

Our mission: For fixed x ∈ [0,1]d , estimate the regression functionr(x) = E[Y |X = x] using the data.

Quality criterion: E[rn(X)− r(X)]2.

G. Biau (UPMC) 41 / 106

Page 57: Présentation G.Biau Random Forests

Mathematical framework

A training sample: Dn = {(X1,Y1), . . . , (Xn,Yn)} i.i.d.[0,1]d × R-valued random variables.

A generic pair: (X,Y ) satisfying EY 2 <∞.

Our mission: For fixed x ∈ [0,1]d , estimate the regression functionr(x) = E[Y |X = x] using the data.

Quality criterion: E[rn(X)− r(X)]2.

G. Biau (UPMC) 41 / 106

Page 58: Présentation G.Biau Random Forests

Mathematical framework

A training sample: Dn = {(X1,Y1), . . . , (Xn,Yn)} i.i.d.[0,1]d × R-valued random variables.

A generic pair: (X,Y ) satisfying EY 2 <∞.

Our mission: For fixed x ∈ [0,1]d , estimate the regression functionr(x) = E[Y |X = x] using the data.

Quality criterion: E[rn(X)− r(X)]2.

G. Biau (UPMC) 41 / 106

Page 59: Présentation G.Biau Random Forests

Mathematical framework

A training sample: Dn = {(X1,Y1), . . . , (Xn,Yn)} i.i.d.[0,1]d × R-valued random variables.

A generic pair: (X,Y ) satisfying EY 2 <∞.

Our mission: For fixed x ∈ [0,1]d , estimate the regression functionr(x) = E[Y |X = x] using the data.

Quality criterion: E[rn(X)− r(X)]2.

G. Biau (UPMC) 41 / 106

Page 60: Présentation G.Biau Random Forests

The model

A random forest is a collection of randomized base regressiontrees {rn(x,Θm,Dn),m ≥ 1}.

These random trees are combined to form the aggregatedregression estimate

r̄n(X,Dn) = EΘ [rn(X,Θ,Dn)] .

Θ is assumed to be independent of X and the training sample Dn.

However, we allow Θ to be based on a test sample, independentof, but distributed as, Dn.

G. Biau (UPMC) 42 / 106

Page 61: Présentation G.Biau Random Forests

The model

A random forest is a collection of randomized base regressiontrees {rn(x,Θm,Dn),m ≥ 1}.

These random trees are combined to form the aggregatedregression estimate

r̄n(X,Dn) = EΘ [rn(X,Θ,Dn)] .

Θ is assumed to be independent of X and the training sample Dn.

However, we allow Θ to be based on a test sample, independentof, but distributed as, Dn.

G. Biau (UPMC) 42 / 106

Page 62: Présentation G.Biau Random Forests

The model

A random forest is a collection of randomized base regressiontrees {rn(x,Θm,Dn),m ≥ 1}.

These random trees are combined to form the aggregatedregression estimate

r̄n(X,Dn) = EΘ [rn(X,Θ,Dn)] .

Θ is assumed to be independent of X and the training sample Dn.

However, we allow Θ to be based on a test sample, independentof, but distributed as, Dn.

G. Biau (UPMC) 42 / 106

Page 63: Présentation G.Biau Random Forests

The model

A random forest is a collection of randomized base regressiontrees {rn(x,Θm,Dn),m ≥ 1}.

These random trees are combined to form the aggregatedregression estimate

r̄n(X,Dn) = EΘ [rn(X,Θ,Dn)] .

Θ is assumed to be independent of X and the training sample Dn.

However, we allow Θ to be based on a test sample, independentof, but distributed as, Dn.

G. Biau (UPMC) 42 / 106

Page 64: Présentation G.Biau Random Forests

The procedure

. Fix kn ≥ 2 and repeat the following procedure dlog2 kne times:

1 At each node, a coordinate of X = (X (1), . . . ,X (d)) is selected, with thej-th feature having a probability pnj ∈ (0,1) of being selected.

2 At each node, once the coordinate is selected, the split is at the midpointof the chosen side.

. Thus

r̄n(X) = EΘ

[∑ni=1 Yi1[Xi∈An(X,Θ)]∑n

i=1 1[Xi∈An(X,Θ)]

1En(X,Θ)

],

where

En(X,Θ) =

[n∑

i=1

1[Xi∈An(X,Θ)] 6= 0

].

G. Biau (UPMC) 43 / 106

Page 65: Présentation G.Biau Random Forests

The procedure

. Fix kn ≥ 2 and repeat the following procedure dlog2 kne times:

1 At each node, a coordinate of X = (X (1), . . . ,X (d)) is selected, with thej-th feature having a probability pnj ∈ (0,1) of being selected.

2 At each node, once the coordinate is selected, the split is at the midpointof the chosen side.

. Thus

r̄n(X) = EΘ

[∑ni=1 Yi1[Xi∈An(X,Θ)]∑n

i=1 1[Xi∈An(X,Θ)]

1En(X,Θ)

],

where

En(X,Θ) =

[n∑

i=1

1[Xi∈An(X,Θ)] 6= 0

].

G. Biau (UPMC) 43 / 106

Page 66: Présentation G.Biau Random Forests

The procedure

. Fix kn ≥ 2 and repeat the following procedure dlog2 kne times:

1 At each node, a coordinate of X = (X (1), . . . ,X (d)) is selected, with thej-th feature having a probability pnj ∈ (0,1) of being selected.

2 At each node, once the coordinate is selected, the split is at the midpointof the chosen side.

. Thus

r̄n(X) = EΘ

[∑ni=1 Yi1[Xi∈An(X,Θ)]∑n

i=1 1[Xi∈An(X,Θ)]

1En(X,Θ)

],

where

En(X,Θ) =

[n∑

i=1

1[Xi∈An(X,Θ)] 6= 0

].

G. Biau (UPMC) 43 / 106

Page 67: Présentation G.Biau Random Forests

Binary trees

G. Biau (UPMC) 44 / 106

Page 68: Présentation G.Biau Random Forests

Binary trees

G. Biau (UPMC) 45 / 106

Page 69: Présentation G.Biau Random Forests

Binary trees

G. Biau (UPMC) 46 / 106

Page 70: Présentation G.Biau Random Forests

Binary trees

G. Biau (UPMC) 47 / 106

Page 71: Présentation G.Biau Random Forests

Binary trees

G. Biau (UPMC) 48 / 106

Page 72: Présentation G.Biau Random Forests

General comments

Each individual tree has exactly 2dlog2 kne (≈ kn) terminal nodes,and each leaf has Lebesgue measure 2−dlog2 kne (≈ 1/kn).

If X has uniform distribution on [0,1]d , there will be on averageabout n/kn observations per terminal node.

The choice kn = n induces a very small number of cases in thefinal leaves.

This scheme is close to what the original randomforests algorithm does.

G. Biau (UPMC) 49 / 106

Page 73: Présentation G.Biau Random Forests

General comments

Each individual tree has exactly 2dlog2 kne (≈ kn) terminal nodes,and each leaf has Lebesgue measure 2−dlog2 kne (≈ 1/kn).

If X has uniform distribution on [0,1]d , there will be on averageabout n/kn observations per terminal node.

The choice kn = n induces a very small number of cases in thefinal leaves.

This scheme is close to what the original randomforests algorithm does.

G. Biau (UPMC) 49 / 106

Page 74: Présentation G.Biau Random Forests

General comments

Each individual tree has exactly 2dlog2 kne (≈ kn) terminal nodes,and each leaf has Lebesgue measure 2−dlog2 kne (≈ 1/kn).

If X has uniform distribution on [0,1]d , there will be on averageabout n/kn observations per terminal node.

The choice kn = n induces a very small number of cases in thefinal leaves.

This scheme is close to what the original randomforests algorithm does.

G. Biau (UPMC) 49 / 106

Page 75: Présentation G.Biau Random Forests

General comments

Each individual tree has exactly 2dlog2 kne (≈ kn) terminal nodes,and each leaf has Lebesgue measure 2−dlog2 kne (≈ 1/kn).

If X has uniform distribution on [0,1]d , there will be on averageabout n/kn observations per terminal node.

The choice kn = n induces a very small number of cases in thefinal leaves.

This scheme is close to what the original randomforests algorithm does.

G. Biau (UPMC) 49 / 106

Page 76: Présentation G.Biau Random Forests

Consistency

TheoremAssume that the distribution of X has support on [0,1]d . Then therandom forests estimate r̄n is consistent whenever pnj log kn →∞ forall j = 1, . . . ,d and kn/n→ 0 as n→∞.

In the purely random model, pnj = 1/d , independently of n and j ,and consistency is ensured as long as kn →∞ and kn/n→ 0.

This is however a radically simplified version of the random forestsused in practice.

A more in-depth analysis is needed.

G. Biau (UPMC) 50 / 106

Page 77: Présentation G.Biau Random Forests

Consistency

TheoremAssume that the distribution of X has support on [0,1]d . Then therandom forests estimate r̄n is consistent whenever pnj log kn →∞ forall j = 1, . . . ,d and kn/n→ 0 as n→∞.

In the purely random model, pnj = 1/d , independently of n and j ,and consistency is ensured as long as kn →∞ and kn/n→ 0.

This is however a radically simplified version of the random forestsused in practice.

A more in-depth analysis is needed.

G. Biau (UPMC) 50 / 106

Page 78: Présentation G.Biau Random Forests

Consistency

TheoremAssume that the distribution of X has support on [0,1]d . Then therandom forests estimate r̄n is consistent whenever pnj log kn →∞ forall j = 1, . . . ,d and kn/n→ 0 as n→∞.

In the purely random model, pnj = 1/d , independently of n and j ,and consistency is ensured as long as kn →∞ and kn/n→ 0.

This is however a radically simplified version of the random forestsused in practice.

A more in-depth analysis is needed.

G. Biau (UPMC) 50 / 106

Page 79: Présentation G.Biau Random Forests

Consistency

TheoremAssume that the distribution of X has support on [0,1]d . Then therandom forests estimate r̄n is consistent whenever pnj log kn →∞ forall j = 1, . . . ,d and kn/n→ 0 as n→∞.

In the purely random model, pnj = 1/d , independently of n and j ,and consistency is ensured as long as kn →∞ and kn/n→ 0.

This is however a radically simplified version of the random forestsused in practice.

A more in-depth analysis is needed.

G. Biau (UPMC) 50 / 106

Page 80: Présentation G.Biau Random Forests

Sparsity

There is empirical evidence that many signals in high-dimensionalspaces admit a sparse representation.

. Images wavelet coefficients.

. High-throughput technologies.

Sparse estimation is playing an increasingly important role in thestatistics and machine learning communities.

Several methods have recently been developed in both fields,which rely upon the notion of sparsity.

G. Biau (UPMC) 51 / 106

Page 81: Présentation G.Biau Random Forests

Sparsity

There is empirical evidence that many signals in high-dimensionalspaces admit a sparse representation.

. Images wavelet coefficients.

. High-throughput technologies.

Sparse estimation is playing an increasingly important role in thestatistics and machine learning communities.

Several methods have recently been developed in both fields,which rely upon the notion of sparsity.

G. Biau (UPMC) 51 / 106

Page 82: Présentation G.Biau Random Forests

Sparsity

There is empirical evidence that many signals in high-dimensionalspaces admit a sparse representation.

. Images wavelet coefficients.

. High-throughput technologies.

Sparse estimation is playing an increasingly important role in thestatistics and machine learning communities.

Several methods have recently been developed in both fields,which rely upon the notion of sparsity.

G. Biau (UPMC) 51 / 106

Page 83: Présentation G.Biau Random Forests

Sparsity

There is empirical evidence that many signals in high-dimensionalspaces admit a sparse representation.

. Images wavelet coefficients.

. High-throughput technologies.

Sparse estimation is playing an increasingly important role in thestatistics and machine learning communities.

Several methods have recently been developed in both fields,which rely upon the notion of sparsity.

G. Biau (UPMC) 51 / 106

Page 84: Présentation G.Biau Random Forests

Sparsity

There is empirical evidence that many signals in high-dimensionalspaces admit a sparse representation.

. Images wavelet coefficients.

. High-throughput technologies.

Sparse estimation is playing an increasingly important role in thestatistics and machine learning communities.

Several methods have recently been developed in both fields,which rely upon the notion of sparsity.

G. Biau (UPMC) 51 / 106

Page 85: Présentation G.Biau Random Forests

Our vision

The regression function r(X) = E[Y |X] depends in fact only on anonempty subset S (for Strong) of the d features.

In other words, letting XS = (Xj : j ∈ S) and S = Card S, we have

r(X) = E[Y |XS ].

In the dimension reduction scenario we have in mind, the ambientdimension d can be very large, much larger than n.

As such, the value S characterizes the sparsity of the model: Thesmaller S, the sparser r .

G. Biau (UPMC) 52 / 106

Page 86: Présentation G.Biau Random Forests

Our vision

The regression function r(X) = E[Y |X] depends in fact only on anonempty subset S (for Strong) of the d features.

In other words, letting XS = (Xj : j ∈ S) and S = Card S, we have

r(X) = E[Y |XS ].

In the dimension reduction scenario we have in mind, the ambientdimension d can be very large, much larger than n.

As such, the value S characterizes the sparsity of the model: Thesmaller S, the sparser r .

G. Biau (UPMC) 52 / 106

Page 87: Présentation G.Biau Random Forests

Our vision

The regression function r(X) = E[Y |X] depends in fact only on anonempty subset S (for Strong) of the d features.

In other words, letting XS = (Xj : j ∈ S) and S = Card S, we have

r(X) = E[Y |XS ].

In the dimension reduction scenario we have in mind, the ambientdimension d can be very large, much larger than n.

As such, the value S characterizes the sparsity of the model: Thesmaller S, the sparser r .

G. Biau (UPMC) 52 / 106

Page 88: Présentation G.Biau Random Forests

Our vision

The regression function r(X) = E[Y |X] depends in fact only on anonempty subset S (for Strong) of the d features.

In other words, letting XS = (Xj : j ∈ S) and S = Card S, we have

r(X) = E[Y |XS ].

In the dimension reduction scenario we have in mind, the ambientdimension d can be very large, much larger than n.

As such, the value S characterizes the sparsity of the model: Thesmaller S, the sparser r .

G. Biau (UPMC) 52 / 106

Page 89: Présentation G.Biau Random Forests

Sparsity and random forests

Ideally, pnj = 1/S for j ∈ S.

To stick to reality, we will rather require that pnj = (1/S)(1 + ξnj).

Such a randomization mechanism may be designed on the basisof a test sample.

Action plan

E [r̄n(X)− r(X)]2 = E [r̄n(X)− r̃n(X)]2 + E [r̃n(X)− r(X)]2 ,

where

r̃n(X) =n∑

i=1

EΘ [Wni(X,Θ)] r(Xi).

G. Biau (UPMC) 53 / 106

Page 90: Présentation G.Biau Random Forests

Sparsity and random forests

Ideally, pnj = 1/S for j ∈ S.

To stick to reality, we will rather require that pnj = (1/S)(1 + ξnj).

Such a randomization mechanism may be designed on the basisof a test sample.

Action plan

E [r̄n(X)− r(X)]2 = E [r̄n(X)− r̃n(X)]2 + E [r̃n(X)− r(X)]2 ,

where

r̃n(X) =n∑

i=1

EΘ [Wni(X,Θ)] r(Xi).

G. Biau (UPMC) 53 / 106

Page 91: Présentation G.Biau Random Forests

Sparsity and random forests

Ideally, pnj = 1/S for j ∈ S.

To stick to reality, we will rather require that pnj = (1/S)(1 + ξnj).

Such a randomization mechanism may be designed on the basisof a test sample.

Action plan

E [r̄n(X)− r(X)]2 = E [r̄n(X)− r̃n(X)]2 + E [r̃n(X)− r(X)]2 ,

where

r̃n(X) =n∑

i=1

EΘ [Wni(X,Θ)] r(Xi).

G. Biau (UPMC) 53 / 106

Page 92: Présentation G.Biau Random Forests

Sparsity and random forests

Ideally, pnj = 1/S for j ∈ S.

To stick to reality, we will rather require that pnj = (1/S)(1 + ξnj).

Such a randomization mechanism may be designed on the basisof a test sample.

Action plan

E [r̄n(X)− r(X)]2 = E [r̄n(X)− r̃n(X)]2 + E [r̃n(X)− r(X)]2 ,

where

r̃n(X) =n∑

i=1

EΘ [Wni(X,Θ)] r(Xi).

G. Biau (UPMC) 53 / 106

Page 93: Présentation G.Biau Random Forests

Variance

Proposition

Assume that X is uniformly distributed on [0,1]d and, for all x ∈ Rd ,

σ2(x) = V[Y |X = x] ≤ σ2

for some positive constant σ2. Then, if pnj = (1/S)(1 + ξnj) for j ∈ S,

E [r̄n(X)− r̃n(X)]2 ≤ Cσ2(

S2

S − 1

)S/2d

(1 + ξn)kn

n(log kn)S/2d ,

where

C =288π

(π log 2

16

)S/2d

.

G. Biau (UPMC) 54 / 106

Page 94: Présentation G.Biau Random Forests

Bias

Proposition

Assume that X is uniformly distributed on [0,1]d and r is L-Lipschitz.Then, if pnj = (1/S)(1 + ξnj) for j ∈ S,

E [r̃n(X)− r(X)]2 ≤ 2SL2

kn0.75

S log 2 (1+γn)+

[sup

x∈[0,1]dr2(x)

]e−n/2kn .

The rate at which the bias decreases to 0 depends on the numberof strong variables, not on d .

kn−(0.75/(S log 2))(1+γn) = o(kn

−2/d ) as soon as S ≤ b0.54dc.

The term e−n/2kn prevents the extreme choice kn = n.

G. Biau (UPMC) 55 / 106

Page 95: Présentation G.Biau Random Forests

Bias

Proposition

Assume that X is uniformly distributed on [0,1]d and r is L-Lipschitz.Then, if pnj = (1/S)(1 + ξnj) for j ∈ S,

E [r̃n(X)− r(X)]2 ≤ 2SL2

kn0.75

S log 2 (1+γn)+

[sup

x∈[0,1]dr2(x)

]e−n/2kn .

The rate at which the bias decreases to 0 depends on the numberof strong variables, not on d .

kn−(0.75/(S log 2))(1+γn) = o(kn

−2/d ) as soon as S ≤ b0.54dc.

The term e−n/2kn prevents the extreme choice kn = n.

G. Biau (UPMC) 55 / 106

Page 96: Présentation G.Biau Random Forests

Bias

Proposition

Assume that X is uniformly distributed on [0,1]d and r is L-Lipschitz.Then, if pnj = (1/S)(1 + ξnj) for j ∈ S,

E [r̃n(X)− r(X)]2 ≤ 2SL2

kn0.75

S log 2 (1+γn)+

[sup

x∈[0,1]dr2(x)

]e−n/2kn .

The rate at which the bias decreases to 0 depends on the numberof strong variables, not on d .

kn−(0.75/(S log 2))(1+γn) = o(kn

−2/d ) as soon as S ≤ b0.54dc.

The term e−n/2kn prevents the extreme choice kn = n.

G. Biau (UPMC) 55 / 106

Page 97: Présentation G.Biau Random Forests

Bias

Proposition

Assume that X is uniformly distributed on [0,1]d and r is L-Lipschitz.Then, if pnj = (1/S)(1 + ξnj) for j ∈ S,

E [r̃n(X)− r(X)]2 ≤ 2SL2

kn0.75

S log 2 (1+γn)+

[sup

x∈[0,1]dr2(x)

]e−n/2kn .

The rate at which the bias decreases to 0 depends on the numberof strong variables, not on d .

kn−(0.75/(S log 2))(1+γn) = o(kn

−2/d ) as soon as S ≤ b0.54dc.

The term e−n/2kn prevents the extreme choice kn = n.

G. Biau (UPMC) 55 / 106

Page 98: Présentation G.Biau Random Forests

Main result

TheoremIf pnj = (1/S)(1 + ξnj) for j ∈ S, with ξnj log n→ 0 as n→∞, then forthe choice

kn ∝(

L2

Ξ

)1/(1+ 0.75S log 2 )

n1/(1+ 0.75S log 2 )

,

we have

lim supn→∞

sup(X,Y )∈F

E [r̄n(X)− r(X)]2(ΞL

2S ln 20.75

) 0.75S ln 2+0.75 n

−0.75S log 2+0.75

≤ Λ.

Take-home message

The rate n−0.75

S log 2+0.75 is strictly faster than the usual minimax raten−2/(d+2) as soon as S ≤ b0.54dc.

G. Biau (UPMC) 56 / 106

Page 99: Présentation G.Biau Random Forests

Main result

TheoremIf pnj = (1/S)(1 + ξnj) for j ∈ S, with ξnj log n→ 0 as n→∞, then forthe choice

kn ∝(

L2

Ξ

)1/(1+ 0.75S log 2 )

n1/(1+ 0.75S log 2 )

,

we have

lim supn→∞

sup(X,Y )∈F

E [r̄n(X)− r(X)]2(ΞL

2S ln 20.75

) 0.75S ln 2+0.75 n

−0.75S log 2+0.75

≤ Λ.

Take-home message

The rate n−0.75

S log 2+0.75 is strictly faster than the usual minimax raten−2/(d+2) as soon as S ≤ b0.54dc.

G. Biau (UPMC) 56 / 106

Page 100: Présentation G.Biau Random Forests

Dimension reduction

10 20 30 40 50 60 70 80 90 10020

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.01960.0196

S

G. Biau (UPMC) 57 / 106

Page 101: Présentation G.Biau Random Forests

Discussion — Bagging

The optimal parameter kn depends on the unknown distribution of(X,Y ).

To correct this situation, adaptive choices of kn should preservethe rate of convergence of the estimate.

Another route we may follow is to analyze the effect of bagging.

Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 58 / 106

Page 102: Présentation G.Biau Random Forests

Discussion — Bagging

The optimal parameter kn depends on the unknown distribution of(X,Y ).

To correct this situation, adaptive choices of kn should preservethe rate of convergence of the estimate.

Another route we may follow is to analyze the effect of bagging.

Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 58 / 106

Page 103: Présentation G.Biau Random Forests

Discussion — Bagging

The optimal parameter kn depends on the unknown distribution of(X,Y ).

To correct this situation, adaptive choices of kn should preservethe rate of convergence of the estimate.

Another route we may follow is to analyze the effect of bagging.

Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 58 / 106

Page 104: Présentation G.Biau Random Forests

Discussion — Bagging

The optimal parameter kn depends on the unknown distribution of(X,Y ).

To correct this situation, adaptive choices of kn should preservethe rate of convergence of the estimate.

Another route we may follow is to analyze the effect of bagging.

Biau, Cérou and Guyader (2010).

G. Biau (UPMC) 58 / 106

Page 105: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 59 / 106

Page 106: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 60 / 106

Page 107: Présentation G.Biau Random Forests

G. Biau (UPMC) 61 / 106

Page 108: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 62 / 106

Page 109: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 63 / 106

Page 110: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 64 / 106

Page 111: Présentation G.Biau Random Forests

G. Biau (UPMC) 65 / 106

Page 112: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 66 / 106

Page 113: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 67 / 106

Page 114: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 68 / 106

Page 115: Présentation G.Biau Random Forests

G. Biau (UPMC) 69 / 106

Page 116: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Imaginary scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 If the selection is all weak, then choose one at random to split on.

3 If there is more than one strong variable elected, choose one atrandom and cut.

G. Biau (UPMC) 70 / 106

Page 117: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Imaginary scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 If the selection is all weak, then choose one at random to split on.

3 If there is more than one strong variable elected, choose one atrandom and cut.

G. Biau (UPMC) 70 / 106

Page 118: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Imaginary scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 If the selection is all weak, then choose one at random to split on.

3 If there is more than one strong variable elected, choose one atrandom and cut.

G. Biau (UPMC) 70 / 106

Page 119: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Imaginary scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 If the selection is all weak, then choose one at random to split on.

3 If there is more than one strong variable elected, choose one atrandom and cut.

G. Biau (UPMC) 70 / 106

Page 120: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Each coordinate in S will be cut with the “ideal” probability

p?n =1S

[1−

(1− S

d

)Mn].

The parameter Mn should satisfy(1− S

d

)Mn

log n→ 0 as n→∞.

This is true as soon as

Mn →∞ andMn

log n→∞ as n→∞.

G. Biau (UPMC) 71 / 106

Page 121: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Each coordinate in S will be cut with the “ideal” probability

p?n =1S

[1−

(1− S

d

)Mn].

The parameter Mn should satisfy(1− S

d

)Mn

log n→ 0 as n→∞.

This is true as soon as

Mn →∞ andMn

log n→∞ as n→∞.

G. Biau (UPMC) 71 / 106

Page 122: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Each coordinate in S will be cut with the “ideal” probability

p?n =1S

[1−

(1− S

d

)Mn].

The parameter Mn should satisfy(1− S

d

)Mn

log n→ 0 as n→∞.

This is true as soon as

Mn →∞ andMn

log n→∞ as n→∞.

G. Biau (UPMC) 71 / 106

Page 123: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 124: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 125: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 126: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 127: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 128: Présentation G.Biau Random Forests

AssumptionsWe have at hand an independent test set D′n.

The model is linear:

Y =∑j∈S

ajX (j) + ε.

For a fixed node A =∏d

j=1 Aj , fix a coordinate j and look at theweighted conditional variance V[Y |X (j) ∈ Aj ]P(X (j) ∈ Aj).

If j ∈ S, then the best split is at the midpoint of the node, with avariance decrease equal to a2

j /16 > 0.

If j ∈ W, the decrease of the variance is always 0, whatever thelocation of the split.

G. Biau (UPMC) 72 / 106

Page 129: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Near-reality scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 For each of the Mn elected coordinates, calculate the best split.

3 Select the coordinate which outputs the best within-node sum ofsquares decrease, and cut.

ConclusionFor j ∈ S,

pnj ≈1S(1 + ξnj

),

where ξnj → 0 and satisfies the constraint ξnj log n→ 0 as n tends toinfinity, provided kn log n/n→ 0, Mn →∞ and Mn/ log n→∞.

G. Biau (UPMC) 73 / 106

Page 130: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Near-reality scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 For each of the Mn elected coordinates, calculate the best split.

3 Select the coordinate which outputs the best within-node sum ofsquares decrease, and cut.

ConclusionFor j ∈ S,

pnj ≈1S(1 + ξnj

),

where ξnj → 0 and satisfies the constraint ξnj log n→ 0 as n tends toinfinity, provided kn log n/n→ 0, Mn →∞ and Mn/ log n→∞.

G. Biau (UPMC) 73 / 106

Page 131: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Near-reality scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 For each of the Mn elected coordinates, calculate the best split.

3 Select the coordinate which outputs the best within-node sum ofsquares decrease, and cut.

ConclusionFor j ∈ S,

pnj ≈1S(1 + ξnj

),

where ξnj → 0 and satisfies the constraint ξnj log n→ 0 as n tends toinfinity, provided kn log n/n→ 0, Mn →∞ and Mn/ log n→∞.

G. Biau (UPMC) 73 / 106

Page 132: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Near-reality scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 For each of the Mn elected coordinates, calculate the best split.

3 Select the coordinate which outputs the best within-node sum ofsquares decrease, and cut.

ConclusionFor j ∈ S,

pnj ≈1S(1 + ξnj

),

where ξnj → 0 and satisfies the constraint ξnj log n→ 0 as n tends toinfinity, provided kn log n/n→ 0, Mn →∞ and Mn/ log n→∞.

G. Biau (UPMC) 73 / 106

Page 133: Présentation G.Biau Random Forests

Discussion — Choosing the pnj ’s

Near-reality scenarioThe following splitting scheme is iteratively repeated at each node:

1 Select at random Mn candidate coordinates to split on.

2 For each of the Mn elected coordinates, calculate the best split.

3 Select the coordinate which outputs the best within-node sum ofsquares decrease, and cut.

ConclusionFor j ∈ S,

pnj ≈1S(1 + ξnj

),

where ξnj → 0 and satisfies the constraint ξnj log n→ 0 as n tends toinfinity, provided kn log n/n→ 0, Mn →∞ and Mn/ log n→∞.

G. Biau (UPMC) 73 / 106

Page 134: Présentation G.Biau Random Forests

Outline

1 Setting

2 A random forests model

3 Layered nearest neighbors and random forests

G. Biau (UPMC) 74 / 106

Page 135: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 75 / 106

Page 136: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 76 / 106

Page 137: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 77 / 106

Page 138: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 78 / 106

Page 139: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 79 / 106

Page 140: Présentation G.Biau Random Forests

G. Biau (UPMC) 80 / 106

Page 141: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 81 / 106

Page 142: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 82 / 106

Page 143: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 83 / 106

Page 144: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 84 / 106

Page 145: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 85 / 106

Page 146: Présentation G.Biau Random Forests

G. Biau (UPMC) 86 / 106

Page 147: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 87 / 106

Page 148: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 88 / 106

Page 149: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 89 / 106

Page 150: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 90 / 106

Page 151: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 91 / 106

Page 152: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 92 / 106

Page 153: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 93 / 106

Page 154: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 94 / 106

Page 155: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 95 / 106

Page 156: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 96 / 106

Page 157: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 97 / 106

Page 158: Présentation G.Biau Random Forests

?

G. Biau (UPMC) 98 / 106

Page 159: Présentation G.Biau Random Forests

Layered Nearest Neighbors

DefinitionLet X1, . . . ,Xn be a sample of i.i.d. random vectors in Rd , d ≥ 2. Anobservation Xi is said to be a LNN of a point x if the hyperrectangledefined by x and Xi contains no other data points.

empty

x

G. Biau (UPMC) 99 / 106

Page 160: Présentation G.Biau Random Forests

What is known about Ln(x)?

... a lot when X1, . . . ,Xn are uniformly distributed over [0,1]d .

For example,

ELn(x) =2d (log n)d−1

(d − 1)!+O

((log n)d−2

)and

(d − 1)! Ln(x)

2d (log n)d−1 → 1 in probability as n→∞.

This is the problem of maxima in random vectors(Barndorff-Nielsen and Sobel, 1966).

G. Biau (UPMC) 100 / 106

Page 161: Présentation G.Biau Random Forests

What is known about Ln(x)?

... a lot when X1, . . . ,Xn are uniformly distributed over [0,1]d .

For example,

ELn(x) =2d (log n)d−1

(d − 1)!+O

((log n)d−2

)and

(d − 1)! Ln(x)

2d (log n)d−1 → 1 in probability as n→∞.

This is the problem of maxima in random vectors(Barndorff-Nielsen and Sobel, 1966).

G. Biau (UPMC) 100 / 106

Page 162: Présentation G.Biau Random Forests

What is known about Ln(x)?

... a lot when X1, . . . ,Xn are uniformly distributed over [0,1]d .

For example,

ELn(x) =2d (log n)d−1

(d − 1)!+O

((log n)d−2

)and

(d − 1)! Ln(x)

2d (log n)d−1 → 1 in probability as n→∞.

This is the problem of maxima in random vectors(Barndorff-Nielsen and Sobel, 1966).

G. Biau (UPMC) 100 / 106

Page 163: Présentation G.Biau Random Forests

Two results (Biau and Devroye, 2010)

ModelX1, . . . ,Xn are independently distributed according to some probabilitydensity f (with probability measure µ).

TheoremFor µ-almost all x ∈ Rd , one has

Ln(x)→∞ in probability as n→∞.

TheoremSuppose that f is λ-almost everywhere continuous. Then

(d − 1)!ELn(x)

2d (log n)d−1 → 1 as n→∞,

at µ-almost all x ∈ Rd .

G. Biau (UPMC) 101 / 106

Page 164: Présentation G.Biau Random Forests

Two results (Biau and Devroye, 2010)

ModelX1, . . . ,Xn are independently distributed according to some probabilitydensity f (with probability measure µ).

TheoremFor µ-almost all x ∈ Rd , one has

Ln(x)→∞ in probability as n→∞.

TheoremSuppose that f is λ-almost everywhere continuous. Then

(d − 1)!ELn(x)

2d (log n)d−1 → 1 as n→∞,

at µ-almost all x ∈ Rd .

G. Biau (UPMC) 101 / 106

Page 165: Présentation G.Biau Random Forests

Two results (Biau and Devroye, 2010)

ModelX1, . . . ,Xn are independently distributed according to some probabilitydensity f (with probability measure µ).

TheoremFor µ-almost all x ∈ Rd , one has

Ln(x)→∞ in probability as n→∞.

TheoremSuppose that f is λ-almost everywhere continuous. Then

(d − 1)!ELn(x)

2d (log n)d−1 → 1 as n→∞,

at µ-almost all x ∈ Rd .

G. Biau (UPMC) 101 / 106

Page 166: Présentation G.Biau Random Forests

LNN regression estimation

Model(X,Y ), (X1,Y1), . . . , (Xn,Yn) are i.i.d. random vectors of Rd × R.Moreover, |Y | is bounded and X has a density.

The regression function r(x) = E[Y |X = x] may be estimated by

rn(x) =1

Ln(x)

n∑i=1

Yi1[Xi∈Ln(x)].

1 No smoothing parameter.

2 A scale-invariant estimate.

3 Intimately connected to Breiman’s random forests.

G. Biau (UPMC) 102 / 106

Page 167: Présentation G.Biau Random Forests

LNN regression estimation

Model(X,Y ), (X1,Y1), . . . , (Xn,Yn) are i.i.d. random vectors of Rd × R.Moreover, |Y | is bounded and X has a density.

The regression function r(x) = E[Y |X = x] may be estimated by

rn(x) =1

Ln(x)

n∑i=1

Yi1[Xi∈Ln(x)].

1 No smoothing parameter.

2 A scale-invariant estimate.

3 Intimately connected to Breiman’s random forests.

G. Biau (UPMC) 102 / 106

Page 168: Présentation G.Biau Random Forests

LNN regression estimation

Model(X,Y ), (X1,Y1), . . . , (Xn,Yn) are i.i.d. random vectors of Rd × R.Moreover, |Y | is bounded and X has a density.

The regression function r(x) = E[Y |X = x] may be estimated by

rn(x) =1

Ln(x)

n∑i=1

Yi1[Xi∈Ln(x)].

1 No smoothing parameter.

2 A scale-invariant estimate.

3 Intimately connected to Breiman’s random forests.

G. Biau (UPMC) 102 / 106

Page 169: Présentation G.Biau Random Forests

LNN regression estimation

Model(X,Y ), (X1,Y1), . . . , (Xn,Yn) are i.i.d. random vectors of Rd × R.Moreover, |Y | is bounded and X has a density.

The regression function r(x) = E[Y |X = x] may be estimated by

rn(x) =1

Ln(x)

n∑i=1

Yi1[Xi∈Ln(x)].

1 No smoothing parameter.

2 A scale-invariant estimate.

3 Intimately connected to Breiman’s random forests.

G. Biau (UPMC) 102 / 106

Page 170: Présentation G.Biau Random Forests

LNN regression estimation

Model(X,Y ), (X1,Y1), . . . , (Xn,Yn) are i.i.d. random vectors of Rd × R.Moreover, |Y | is bounded and X has a density.

The regression function r(x) = E[Y |X = x] may be estimated by

rn(x) =1

Ln(x)

n∑i=1

Yi1[Xi∈Ln(x)].

1 No smoothing parameter.

2 A scale-invariant estimate.

3 Intimately connected to Breiman’s random forests.

G. Biau (UPMC) 102 / 106

Page 171: Présentation G.Biau Random Forests

Consistency

Theorem (Pointwise Lp-consistency)Assume that the regression function r is λ-almost everywherecontinuous and that Y is bounded. Then, for µ-almost all x ∈ Rd andall p ≥ 1,

E |rn(x)− r(x)|p → 0 as n→∞.

Theorem (Gobal Lp-consistency)Under the same conditions, for all p ≥ 1,

E |rn(X)− r(X)|p → 0 as n→∞.

1 No universal consistency result with respect to r is possible.

2 The results do not impose any condition on the density.

3 They are also scale-free.

G. Biau (UPMC) 103 / 106

Page 172: Présentation G.Biau Random Forests

Consistency

Theorem (Pointwise Lp-consistency)Assume that the regression function r is λ-almost everywherecontinuous and that Y is bounded. Then, for µ-almost all x ∈ Rd andall p ≥ 1,

E |rn(x)− r(x)|p → 0 as n→∞.

Theorem (Gobal Lp-consistency)Under the same conditions, for all p ≥ 1,

E |rn(X)− r(X)|p → 0 as n→∞.

1 No universal consistency result with respect to r is possible.

2 The results do not impose any condition on the density.

3 They are also scale-free.

G. Biau (UPMC) 103 / 106

Page 173: Présentation G.Biau Random Forests

Consistency

Theorem (Pointwise Lp-consistency)Assume that the regression function r is λ-almost everywherecontinuous and that Y is bounded. Then, for µ-almost all x ∈ Rd andall p ≥ 1,

E |rn(x)− r(x)|p → 0 as n→∞.

Theorem (Gobal Lp-consistency)Under the same conditions, for all p ≥ 1,

E |rn(X)− r(X)|p → 0 as n→∞.

1 No universal consistency result with respect to r is possible.

2 The results do not impose any condition on the density.

3 They are also scale-free.

G. Biau (UPMC) 103 / 106

Page 174: Présentation G.Biau Random Forests

Consistency

Theorem (Pointwise Lp-consistency)Assume that the regression function r is λ-almost everywherecontinuous and that Y is bounded. Then, for µ-almost all x ∈ Rd andall p ≥ 1,

E |rn(x)− r(x)|p → 0 as n→∞.

Theorem (Gobal Lp-consistency)Under the same conditions, for all p ≥ 1,

E |rn(X)− r(X)|p → 0 as n→∞.

1 No universal consistency result with respect to r is possible.

2 The results do not impose any condition on the density.

3 They are also scale-free.

G. Biau (UPMC) 103 / 106

Page 175: Présentation G.Biau Random Forests

Consistency

Theorem (Pointwise Lp-consistency)Assume that the regression function r is λ-almost everywherecontinuous and that Y is bounded. Then, for µ-almost all x ∈ Rd andall p ≥ 1,

E |rn(x)− r(x)|p → 0 as n→∞.

Theorem (Gobal Lp-consistency)Under the same conditions, for all p ≥ 1,

E |rn(X)− r(X)|p → 0 as n→∞.

1 No universal consistency result with respect to r is possible.

2 The results do not impose any condition on the density.

3 They are also scale-free.

G. Biau (UPMC) 103 / 106

Page 176: Présentation G.Biau Random Forests

Back to random forests

A random forest can be viewed as a weighted LNN regression estimate

r̄n(x) =n∑

i=1

YiWni(x),

where the weights concentrate on the LNN and satisfyn∑

i=1

Wni(x) = 1.

G. Biau (UPMC) 104 / 106

Page 177: Présentation G.Biau Random Forests

Non-adaptive strategies

Consider the non-adaptive random forests estimate

r̄n(x) =n∑

i=1

YiWni(x).

where the weights concentrate on the LNN.Proposition

For any x ∈ Rd , assume that σ2 = V[Y |X = x] is independent of x.Then

E [r̄n(x)− r(x)]2 ≥ σ2

ELn(x).

G. Biau (UPMC) 105 / 106

Page 178: Présentation G.Biau Random Forests

Non-adaptive strategies

Consider the non-adaptive random forests estimate

r̄n(x) =n∑

i=1

YiWni(x).

where the weights concentrate on the LNN.Proposition

For any x ∈ Rd , assume that σ2 = V[Y |X = x] is independent of x.Then

E [r̄n(x)− r(x)]2 ≥ σ2

ELn(x).

G. Biau (UPMC) 105 / 106

Page 179: Présentation G.Biau Random Forests

Rate of convergence

At µ-almost all x, when f is λ-almost everywhere continuous,

E [r̄n(x)− r(x)]2 &σ2(d − 1)!

2d (log n)d−1 .

Improving the rate of convergence1 Stop as soon as a future rectangle split would cause a

sub-rectangle to have fewer than kn points.

2 Resort to bagging and randomize using random subsamples.

G. Biau (UPMC) 106 / 106

Page 180: Présentation G.Biau Random Forests

Rate of convergence

At µ-almost all x, when f is λ-almost everywhere continuous,

E [r̄n(x)− r(x)]2 &σ2(d − 1)!

2d (log n)d−1 .

Improving the rate of convergence1 Stop as soon as a future rectangle split would cause a

sub-rectangle to have fewer than kn points.

2 Resort to bagging and randomize using random subsamples.

G. Biau (UPMC) 106 / 106

Page 181: Présentation G.Biau Random Forests

Rate of convergence

At µ-almost all x, when f is λ-almost everywhere continuous,

E [r̄n(x)− r(x)]2 &σ2(d − 1)!

2d (log n)d−1 .

Improving the rate of convergence1 Stop as soon as a future rectangle split would cause a

sub-rectangle to have fewer than kn points.

2 Resort to bagging and randomize using random subsamples.

G. Biau (UPMC) 106 / 106