Semi-random model tree ensembles: an effective and scalable regression method
description
Transcript of Semi-random model tree ensembles: an effective and scalable regression method
Semi-random model tree ensembles: an effectiveand scalable regression method
Bernhard PfahringerDepartment of Computer Science
University of Waikato, New Zealand
September 22nd , 2011
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 1 / 28
Background
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 2 / 28
Background
Local regression
non-linear functions can be approximated by a set of locally linearestimatorsRegression and model trees are fast multi-variate versions of localregression
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 3 / 28
Background
Piece-wise linear approximation example
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 4 / 28
Background
Sample Regression Tree: constants in the leaves
A159 <= −0.62 :A149 <= 0.52 : Y = 1.6977A149 > 0.52 : Y = 1.2213
A159 > −0.62 :A149 <= 0.638 :
A57 <= −0.485 : Y = 0.8388A57 > −0.485 : Y = 1.0569
A149 > 0.638 : Y = 0.6062
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 5 / 28
Background
Sample Model Tree: linear models in the leaves
A159 <= −0.62 :A149 <= 0.52 : LM1A149 > 0.52 : LM2
A159 > −0.62 :A149 <= 0.638 : LM3A149 > 0.638 : LM4
LM1 Y = −0.597 ∗ A149− 0.211 ∗ A159 + 1.901LM2 Y = −0.471 ∗ A149− 0.211 ∗ A159 + 1.353LM3 Y = −0.365 ∗ A149− 0.232 ∗ A159 + 1.017LM4 Y = −0.555 ∗ A149− 0.232 ∗ A159 + 0.776
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 6 / 28
Algorithm
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 7 / 28
Algorithm
Ensembles of Semi-Random Model Trees
Ensembles usually improve resultsMost ensembles use randomization to generate diversity2 sources of randomness:
For each tree: divide data into a train and a validation setTo split: select best attribute from a random subset of all attributes
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 8 / 28
Algorithm
Single Semi-Random Model Tree
Only consider median as split value (=> balanced trees)Leaf model: linear ridge regression modelCap model predictions inside observed extremesOptimise tree depth and ridge value using the validation set
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 9 / 28
Algorithm
Build ensemble
BUILDENSEMBLE(data, numTrees, k)
1 for i = 1 to numTrees2 do randomly split data into two:3 train + validate4 BUILDTREE(train, validate, k)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 10 / 28
Algorithm
BuildTree
BUILDTREE(train, validate, k)
1 min← MINTARGETVALUE(train)2 max ← MAXTARGETVALUE(train)3 localSSE ← LINREG(train, validate)4 �
5 if |train| > 10 & |validate| > 106 do split ← RANDOMSPLIT(train, k)7 �
8 smT ← SMALLER(train, split)9 smV ← SMALLER(validate, split)
10 smaller ← BUILDTREE(smT , smV , k)11 �
12 laT ← LARGER(train, split)13 laV ← LARGER(validate, split)14 larger ← BUILDTREE(laT , laV , k)15 �
16 subSSE ← SSE(smaller , larger , validate)17 �
18 if localSSE < subSSE19 do smaller ← null20 larger ← null21 else22 localModel ← null
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 11 / 28
Algorithm
BuildTree, continued
15 subSSE ← SSE(smaller , larger , validate)16 �
17 if localSSE < subSSE18 do smaller ← null19 larger ← null20 else21 localModel ← null
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 12 / 28
Algorithm
Ridge regression
LINREG(train, validate)
1 for ridge in 10−8, 10−4, 10−2, 10−1, 1, 102 do modelr ← RIDGEREGRESS(train, ridge)3 sser ← SSE(modelr , validate)4 if bestModel == model105 do build models for ridge = 102, 103, ...6 and so on while improving7 localModel ← bestModel8 return minimum-sse-on-validation-data
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 13 / 28
Algorithm
Random split selection
RANDOMSPLIT(train, k)
1 for i = 1 to k2 do splitAttr ← RANDOM CHOICE(allAttrs)3 stump ← STUMP(APPROX MEDIAN(splitAttr))4 compute SSE(stump, train)5 return minimum-sse stump
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 14 / 28
Algorithm
Parameter Settings
reported experiments:
average predictions of 50 randomized model treesto split select best of 50% randomly selected attributes
generally: should optimise separately for every application, e.g. usingcross-validation
number of trees: “the more the merrier”, but diminishing returnsnumber of randomly selected attributes: 50% is a good default, butmay be depend on the total number and on sparseness
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 15 / 28
Results
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 16 / 28
Results
Comparison
use more than 20 Torgo/UCI datasets, > 900 examplesrepeated 2
3 training, 13 testing splits
training split into equal build and validation halves (13 , 1
3 )preprocessed for missing or categorical valuescompare to:
LR: linear ridge regression, optimise ridge valueGP: gaussian process regression, optimise noise level and RBFgammaAG: additive groves, use ”fast” script
use RMAE: relative mean absolute error
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 17 / 28
Results
RMAE on Torgo/UCI
RMAE for Torgo/UCI data
0
10
20
30
40
50
60
70
80
90
100
colorh
istog
ram
layo
ut
cooc
textur
e
colorm
omen
ts
bank
8FM
stoc
kmv
ailero
ns
elnino
elev
ator
sfri
ed
delta
_aile
rons
2dplan
es
delta
_eleva
tors
cal_ho
using
cpu_
act
cpu_
small
bank
32nh
abalon
epo
l
hous
e_8L
puma8
NH
kin8
nm
hous
e_16
H
puma3
2H
quak
e
RMT
GP
LR
AG
Figure: RMAE for Torgo/UCI datasets, sorted by the linear regression result.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 18 / 28
Results
Build times on Torgo/UCI
Training time in seconds for Torgo/UCI data
0.1
1
10
100
1000
10000
100000
stockquake
abalone
delta_ailerons
bank32nh
bank8FM
cpu_act
cpu_small
kin8nm
puma32H
puma8NH
delta_elevators
ailerons po
l
elevators
cal_housing
house_16H
house_8L
2dplanesfried mv
layout
colorhistogram
colormoments
cooctextureelnino
RMT
GP
LR
AG
Figure: Training time in seconds for Torgo/UCI datasets, sorted by thenumber of instances in each dataset; note the use of a logarithmic y-scale.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 19 / 28
Results
UCI Census dataset
Table: Partial results, 2458285 examples in total, therefore about 800000 inthe training fold.
Method RMAE Time (secs)LR 15.96 1205RMT 9.78 19811GP ? ? (would need 5 Tb RAM)AG ? ? (estimated 2000000)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 20 / 28
Results
Near infrared (NIR) Datasets
proprietary NIR data
7 datasetsfrom 255 upto 7500 spectrabetween 170 and 500odd featurespreprocessed for noise and base line shift
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 21 / 28
Results
Sample NIR spectrum
Prepocessed sample spectrum (nitrogen in soil)
-2
-1
0
1
2
3
4
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 22 / 28
Results
RMAE on NIR data
RMAE for NIR datasets
10
20
30
40
50
60
70
80
90
n omd rmd tc phe ph p5 na g5
RMT
GP
LR
AG
Figure: RMAE for NIR datasets, sorted by the linear regression result.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 23 / 28
Results
Build times on NIR data
Training time in seconds for NIR data
0.1
1
10
100
1000
10000
100000
omd rmd na n tc ph phe p5 g5
RMTGPLRAG
Figure: Training time in seconds for NIR datasets, sorted by the number ofinstances in each dataset; note the use of a logarithmic y-scale.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 24 / 28
Results
Random Model Tree Build Times discussion
complexity is O(K ∗ N ∗ logN + K 2 ∗ N)
second term (linear model computation) seems to dominatetherefore observed complexity ∼ O(K 2 ∗ N)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 25 / 28
Summary
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 26 / 28
Summary
Conclusions
Semi-Random Model Trees perform wellThey are fast: build time is practically linear in NCan model non-linear relationships
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 27 / 28
Summary
Future Work
Improve efficiency for large KStudy more and different regression problemsMore comparisons to alternative regression schemesStreaming/Moa variant
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 28 / 28