3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009...
-
Upload
robyn-elliott -
Category
Documents
-
view
287 -
download
9
Transcript of 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009...
![Page 1: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/1.jpg)
3.1
Ch. 6: Decision Trees
Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009
based on slides from Stephen Marsland,Jia Li, and Ruoming Jin.
Longin Jan LateckiTemple [email protected]
![Page 2: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/2.jpg)
3.2
Illustrating Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes 10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ? 10
Test Set
Learningalgorithm
Training Set
![Page 3: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/3.jpg)
3.3159.302 Stephen Marsland
Decision Trees
Split classification down into a series of choices about features in turn
Lay them out in a treeProgress down the tree to the leaves
![Page 4: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/4.jpg)
3.4159.302 Stephen Marsland
Play Tennis Example
Outlook
OvercastSunny Rain
Humidity WindYes
High Normal Strong Weak
No YesNo Yes
![Page 5: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/5.jpg)
3.5159.302 Stephen Marsland
Day Outlook Temp Humid Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
![Page 6: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/6.jpg)
3.6159.302 Stephen Marsland
Rules and Decision Trees
Can turn the tree into a set of rules:(outlook = sunny & humidity = normal) |
(outlook = overcast) | (outlook = rain & wind = weak)
How do we generate the trees?Need to choose featuresNeed to choose order of features
![Page 7: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/7.jpg)
3.7
A tree structure classification rule for a medical example
159.302 Stephen Marsland
![Page 8: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/8.jpg)
3.8159.302 Stephen Marsland
The construction of a tree involves the following threeelements:1. The selection of the splits.2. The decisions when to declare a node as terminal or to continue splitting.3. The assignment of each terminal node to one of the classes.
![Page 9: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/9.jpg)
3.9
Goodness of split
159.302 Stephen Marsland
The goodness of split is measured by an impurity functiondefined for each node.Intuitively, we want each leaf node to be “pure”, that is, one class dominates.
![Page 10: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/10.jpg)
3.10
How to determine the Best Split
OwnCar?
C0: 6C1: 4
C0: 4C1: 6
C0: 1C1: 3
C0: 8C1: 0
C0: 1C1: 7
CarType?
C0: 1C1: 0
C0: 1C1: 0
C0: 0C1: 1
StudentID?
...
Yes No Family
Sports
Luxury c1c10
c20
C0: 0C1: 1
...
c11
Before Splitting: 10 records of class 0,10 records of class 1
Which test condition is the best?
![Page 11: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/11.jpg)
3.11
How to determine the Best Split
Greedy approach: Nodes with homogeneous class distribution are
preferred
Need a measure of node impurity:
C0: 5C1: 5
C0: 9C1: 1
Non-homogeneous,
High degree of impurityHomogeneous,
Low degree of impurity
![Page 12: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/12.jpg)
3.12
Measures of Node Impurity
Entropy
Gini Index
![Page 13: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/13.jpg)
3.13159.302 Stephen Marsland
Entropy
Let F be a feature with possible values f1, …, fn
Let p be a pdf (probability density function) of F; usually p is simply given by a histogram (p1, …, pn), where pi is the proportion of the data that has value F=fi.
Entropy of p tells us how much extra information we get from knowing the value of the feature, i.e, F=fi for a given data point .
Measures the amount in impurity in the set of features Makes sense to pick the features that provides the most
information
![Page 14: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/14.jpg)
3.14159.302 Stephen Marsland
E.g., if F is a feature with two possible values +1 and -1, and p1=1 and p2=0, then we get no new information from knowing that F=+1 for a given example. Thus the entropy is zero. If p1=0.5 and p2=0.5, then the entropy is at maximum.
![Page 15: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/15.jpg)
3.15
Entropy and Decision Tree
Entropy at a given node t:
(NOTE: p( j | t) is the relative frequency of class j at node t).
Measures homogeneity of a node. Maximum (log nc) when records are equally distributed
among all classes implying least information
Minimum (0.0) when all records belong to one class, implying most information
Entropy based computations are similar to the GINI index computations
j
tjptjptEntropy )|(log)|()(
![Page 16: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/16.jpg)
3.16
Examples for computing Entropy
C1 0 C2 6
C1 2 C2 4
C1 1 C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0
P(C1) = 1/6 P(C2) = 5/6
Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65
P(C1) = 2/6 P(C2) = 4/6
Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
j
tjptjptEntropy )|(log)|()(2
![Page 17: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/17.jpg)
3.17
Splitting Based on Information Gain
Information Gain:
Parent Node, p is split into k partitions;
ni is number of records in partition i
Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN)
Used in ID3 and C4.5
Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.
k
i
i
splitiEntropy
nn
pEntropyGAIN1
)()(
![Page 18: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/18.jpg)
3.18159.302 Stephen Marsland
Information Gain
Choose the feature that provides the highest information gain over all examples
That is all there is to ID3:At each stage, pick the feature with the highest
information gain
![Page 19: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/19.jpg)
3.19159.302 Stephen Marsland
Example
Values(Wind) = Weak, StrongS = [9+, 5-]S(Weak) <- [6+, 2-]S(Strong) <- [3+, 3-]
Gain(S,Wind) = Entropy(S) - (8/14) Entropy(S(Weak)) - (6/14)Entropy(S(Strong))
= 0.94 - (8/14)0.811 - (6/14)1.00
![Page 20: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/20.jpg)
3.20159.302 Stephen Marsland
Day Outlook Temp Humid Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
![Page 21: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/21.jpg)
3.21159.302 Stephen Marsland
ID3 (Quinlan)
Search over all possible treesGreedy search - no backtrackingSusceptible to local minimaUses all features - no pruning
Can deal with noiseLabels are most common value of examples
![Page 22: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/22.jpg)
3.22159.302 Stephen Marsland
ID3Feature F with highest Gain(S,F) F
G
Etc.
wv
y x
Feature G with highest Gain(Sw,G)Leaf node,
category c1
Sv only contains examples in category c1
Etc.
![Page 23: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/23.jpg)
3.23159.302 Stephen Marsland
Search
![Page 24: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/24.jpg)
3.24159.302 Stephen Marsland
Example
Outlook
OvercastSunny Rain
? ?Yes
[9+, 5-]
[3+, 2-][4+, 0-][2+, 3-]
![Page 25: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/25.jpg)
3.25159.302 Stephen Marsland
Inductive Bias
How does the algorithm generalise from the training examples?Choose features with highest information gainMinimise amount of information is leftBias towards shorter treesOccam’s RazorPut most useful features near root
![Page 26: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/26.jpg)
3.26159.302 Stephen Marsland
Missing Data
Suppose that one feature has no valueCan miss out that node and carry on down
the tree, following all paths out of that nodeCan therefore still get a classificationVirtually impossible with neural networks
![Page 27: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/27.jpg)
3.27159.302 Stephen Marsland
C4.5
Improved version of ID3, also by QuinlanUse a validation set to avoid overfitting
Could just stop choosing features (early stopping)
Better results from post-pruningMake whole treeChop off some parts of tree afterwards
![Page 28: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/28.jpg)
3.28159.302 Stephen Marsland
Post-Pruning
Run over treePrune each node by replacing subtree below
with a leafEvaluate error and keep if error same or better
![Page 29: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/29.jpg)
3.29159.302 Stephen Marsland
Rule Post-Pruning
Turn tree into set of if-then rulesRemove preconditions from each rule in
turn, and check accuracySort rules according to accuracyRules are easy to read
![Page 30: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/30.jpg)
3.30159.302 Stephen Marsland
Rule Post-Pruning
IF ((outlook = sunny) & (humidity = high))THEN playTennis = noRemove preconditions:
Consider IF (outlook = sunny)And IF (humidity = high)Test accuracyIf one of them is better, try removing both
![Page 31: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/31.jpg)
3.31159.302 Stephen Marsland
ID3 Decision Tree
Outlook
OvercastSunny Rain
Humidity WindYes
High Normal Strong Weak
No YesNo Yes
![Page 32: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/32.jpg)
3.32159.302 Stephen Marsland
Test Case
Outlook = SunnyTemperature = CoolHumidity = HighWind = Strong
![Page 33: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/33.jpg)
3.33
Party Example, Section 6.4, p. 147Construct a decision tree based on these data:
159.302 Stephen Marsland
Deadline, Party, Lazy, ActivityUrgent, Yes, Yes, PartyUrgent, No, Yes, StudyNear, Yes, Yes, PartyNone, Yes, No, PartyNone, No, Yes, PubNone, Yes, No, PartyNear, No, No, StudyNear, No, Yes, TVNear, Yes, Yes, PartyUrgent, No, No, Study
![Page 34: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/34.jpg)
3.35
Measure of Impurity: GINIGini Index for a given node t :
(NOTE: p( j | t) is the relative frequency of class j at node t).
Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information
Minimum (0.0) when all records belong to one class, implying most interesting information
j
tjptGINI 2)]|([1)(
C1 0C2 6
Gini=0.000
C1 2C2 4
Gini=0.444
C1 3C2 3
Gini=0.500
C1 1C2 5
Gini=0.278
![Page 35: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/35.jpg)
3.36
Examples for computing GINI
C1 0 C2 6
C1 2 C2 4
C1 1 C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0
j
tjptGINI 2)]|([1)(
P(C1) = 1/6 P(C2) = 5/6
Gini = 1 – (1/6)2 – (5/6)2 = 0.278
P(C1) = 2/6 P(C2) = 4/6
Gini = 1 – (2/6)2 – (4/6)2 = 0.444
![Page 36: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/36.jpg)
3.37
Splitting Based on GINI
Used in CART, SLIQ, SPRINT.
When a node p is split into k partitions (children), the quality of split is computed as,
where, ni = number of records at child i,
n = number of records at node p.
k
i
isplit iGINI
n
nGINI
1
)(
![Page 37: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/37.jpg)
3.38
Binary Attributes: Computing GINI Index
Splits into two partitions Larger and Purer Partitions are sought for
B?
Yes No
Node N1 Node N2
Parent
C1 6
C2 6
Gini = 0.500
N1 N2 C1 5 1
C2 2 4
Gini=0.333
Gini(N1) = 1 – (5/6)2 – (2/6)2 = 0.194
Gini(N2) = 1 – (1/6)2 – (4/6)2 = 0.528
Gini(Children) = 7/12 * 0.194 + 5/12 * 0.528= 0.333
![Page 38: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/38.jpg)
3.39
Categorical Attributes: Computing Gini Index
For each distinct value, gather counts for each class in the dataset
Use the count matrix to make decisions
CarType{Sports,Luxury}
{Family}
C1 3 1
C2 2 4
Gini 0.400
CarType
{Sports}{Family,Luxury}
C1 2 2
C2 1 5
Gini 0.419
CarType
Family Sports Luxury
C1 1 2 1
C2 4 1 1
Gini 0.393
Multi-way splitTwo-way split
(find best partition of values)
![Page 39: 3.1 Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming.](https://reader033.fdocuments.us/reader033/viewer/2022061417/56649f0c5503460f94c1f5cc/html5/thumbnails/39.jpg)
3.40
Homework
Written homework for everyone, due on Oct. 5: Problem 6.3, p. 151.
Matlab Homework:
Demonstrate decision tree learning and classification on the party example.