C4.5 - pruning decision trees - Leiden...
Transcript of C4.5 - pruning decision trees - Leiden...
![Page 1: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/1.jpg)
C4.5 -pruning decision trees
![Page 2: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/2.jpg)
Quiz 1
![Page 3: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/3.jpg)
Quiz 1
Q: Is a tree with only pure leafs always the best classifier you can have?
A: No.
![Page 4: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/4.jpg)
Quiz 1
Q: Is a tree with only pure leafs always the best classifier you can have?
A: No.
This tree is the best classifier on the training set, but possibly not on new and unseen data. Because of overfitting, the tree may not generalize very well.
![Page 5: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/5.jpg)
![Page 6: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/6.jpg)
Pruning§ Goal: Prevent overfitting to noise in the
data
§ Two strategies for “pruning” the decision tree:
§ Postpruning - take a fully-grown decision tree and discard unreliable parts
§ Prepruning - stop growing a branch when information becomes unreliable
![Page 7: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/7.jpg)
Prepruning
§ Based on statistical significance test
§ Stop growing the tree when there is no statistically significant association between any attribute and the class at a particular node
§ Most popular test: chi-squared test
§ ID3 used chi-squared test in addition to information gain
§ Only statistically significant attributes were allowed to be selected by information gain procedure
![Page 8: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/8.jpg)
Early stopping
§ Pre-pruning may stop the growth process prematurely: early stopping
§ Classic example: XOR/Parity-problem
§ No individual attribute exhibits any significant association to the class
§ Structure is only visible in fully expanded tree
§ Pre-pruning won’t expand the root node
§ But: XOR-type problems rare in practice
§ And: pre-pruning faster than post-pruning
a b class
1 0 0 0
2 0 1 1
3 1 0 1
4 1 1 0
![Page 9: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/9.jpg)
Post-pruning§ First, build full tree
§ Then, prune it
§ Fully-grown tree shows all attribute interactions
§ Problem: some subtrees might be due to chance effects
§ Two pruning operations:
1. Subtree replacement
2. Subtree raising
![Page 10: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/10.jpg)
Subtreereplacement
§ Bottom-up
§ Consider replacing a tree only after considering all its subtrees
![Page 11: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/11.jpg)
Subtreereplacement
§ Bottom-up
§ Consider replacing a tree only after considering all its subtrees
![Page 12: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/12.jpg)
Subtreereplacement
§ Bottom-up
§ Consider replacing a tree only after considering all its subtrees
![Page 13: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/13.jpg)
Subtreereplacement
§ Bottom-up
§ Consider replacing a tree only after considering all its subtrees
![Page 14: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/14.jpg)
Estimating error rates§ Prune only if it reduces the estimated error
§ Error on the training data is NOT a useful estimator
§ Use hold-out set for pruning § (“reduced-error pruning”)
§ C4.5’s method§ Derive confidence interval from training data
§ Use a heuristic limit, derived from this, for pruning
§ Standard Bernoulli-process-based method
§ Shaky statistical assumptions (based on training data)
![Page 15: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/15.jpg)
Estimating Error Rates
![Page 16: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/16.jpg)
Estimating Error Rates
Q: what is the error rate on the training set?
![Page 17: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/17.jpg)
Estimating Error Rates
Q: what is the error rate on the training set?
A: 0.33 (2 out of 6)
![Page 18: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/18.jpg)
Estimating Error Rates
Q: what is the error rate on the training set?
A: 0.33 (2 out of 6)
![Page 19: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/19.jpg)
Estimating Error Rates
Q: what is the error rate on the training set?
A: 0.33 (2 out of 6)
Q: will the error on the test set be bigger, smaller or equal?
![Page 20: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/20.jpg)
Estimating Error Rates
Q: what is the error rate on the training set?
A: 0.33 (2 out of 6)
Q: will the error on the test set be bigger, smaller or equal?
A: bigger
![Page 21: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/21.jpg)
Estimating the error§ Assume making an error is Bernoulli trial with probability p
§ p is unknown (true error rate)
§ We observe f, the success rate f = S/N
§ For large enough N, f follows a Normal distribution
§ Mean and variance for f : p, p (1–p)/N
p -σ p p +σ
![Page 22: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/22.jpg)
Estimating the error§ c% confidence interval [–z ≤ X ≤ z] for random variable with
0 mean is given by:
§ With a symmetric distribution:
c
![Page 23: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/23.jpg)
z-transforming f§ Transformed value for f :
(i.e. subtract the mean and divide by the standard deviation)
§ Resulting equation:
§ Solving for p:
-1 0 1
![Page 24: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/24.jpg)
C4.5’s method§ Error estimate for subtree is weighted sum of error
estimates for all its leaves
§ Error estimate for a node (upper bound):
§ If c = 25% then z = 0.69 (from normal distribution)Pr[X ≥ z] z
1% 2.33
5% 1.65
10% 1.28
20% 0.84
25% 0.69
40% 0.25
![Page 25: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/25.jpg)
C4.5’s method
![Page 26: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/26.jpg)
C4.5’s method
f is the observed error
![Page 27: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/27.jpg)
C4.5’s method
f is the observed error
z = 0.69
![Page 28: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/28.jpg)
C4.5’s method
f is the observed error
z = 0.69
e > f
e = (f + ε1)/(1+ ε2)
![Page 29: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/29.jpg)
C4.5’s method
f is the observed error
z = 0.69
e > f
e = (f + ε1)/(1+ ε2)
N →∞, e = f
![Page 30: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/30.jpg)
Example
f=0.33 e=0.47
f=0.5 e=0.72
f=0.33 e=0.47
![Page 31: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/31.jpg)
Example
f=0.33 e=0.47
f=0.5 e=0.72
f=0.33 e=0.47
Combined using ratios 6:2:6 gives 0.51
![Page 32: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/32.jpg)
Example
f=0.33 e=0.47
f=0.5 e=0.72
f=0.33 e=0.47
f = 5/14=0.36 e = 0.46e < 0.51so prune!
Combined using ratios 6:2:6 gives 0.51
![Page 33: C4.5 - pruning decision trees - Leiden Universityliacs.leidenuniv.nl/~csdatami/DaMi2012-2013/slides/C45 pruning.pdfC4.5’s method! Derive confidence interval from training data! Use](https://reader036.fdocuments.us/reader036/viewer/2022081619/60feaf9de9c519291128699e/html5/thumbnails/33.jpg)
Summary
§ Decision Trees§ splits – binary, multi-way§ split criteria – information gain, gain ratio, …§ pruning
§ No method is always superior – experiment!