Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple...
Transcript of Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple...
![Page 1: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/1.jpg)
Basic Learning Methods: 1R, Decision Trees
1
![Page 2: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/2.jpg)
Supervised Learning● Aim: Construct a model that is able to predict the class
label of a data instance.◆ Classification learning
● Training / Learning◆ Automatically construct the model using training
data● Testing / Operational Usage
◆ Make use of the learned model to predict an unseen data instance
◆ Measure the performance of the model
2
![Page 3: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/3.jpg)
Simplicity first● Simple algorithms sometimes work well!● There are many kinds of simple structure, e.g.
◆ One attribute does all the work◆ All attributes contribute equally & independently◆ A weighted linear combination might do◆ Instance-based: use a few prototypes◆ Use simple logical rules
● Sometimes, success of method depends on the domain
3
![Page 4: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/4.jpg)
Inferring rudimentary rules
● 1R: learns a 1-level decision tree◆ i.e., rules that all test one particular attribute
● Basic version◆ one branch for each value◆ each branch assigns most frequent class
● Error rate: proportion of instances that don’t belong to the majority class of their corresponding branch
● Choose attribute with lowest error rate (assumes nominal attributes)
4
![Page 5: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/5.jpg)
Input instances with attributes
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHighHotSunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlook● Play attribute has a
special role – class attribute
● Learn a model to predict the outcome of the class attribute (i.e., Play)
5
![Page 6: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/6.jpg)
Rule Template for 1RTemplate of the knowledge (simple rule)If <attribute> is:
<value1>, then <class> is <outcome1><value2> , then <class> is <outcome2>
::
6
![Page 7: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/7.jpg)
Pseudo‐code for 1R
For each attribute,For each value of the attribute, make a rule as follows:
count how often each class appearsfind the most frequent classmake the rule assign that class to this attribute-value
Calculate the error rate of the rulesChoose the rules with the smallest error rate
Note: “missing” is treated as a separate attribute value
If <attribute> is:<value1>, then <class> is <outcome1><value2> , then <class> is <outcome2>
::
Pseudo-code
Template of the knowledge (simple rule)
7
![Page 8: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/8.jpg)
Processing the Attributes
3/6True No
5/142/8False YesWindy
1/7Normal Yes
4/143/7High NoHumidity
5/14
4/14
Total errors
1/4Cool Yes
2/6Mild Yes
2/4Hot NoTemp
2/5Rainy Yes
0/4Overcast Yes
2/5Sunny NoOutlook
ErrorsRulesAttribute
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHighHotSunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlook
8
![Page 9: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/9.jpg)
Output Solution● There are two solutions as shown below. The final
solution can be arbitrarily selected from one of them.● First solution –
If Outlook is:● Sunny, then play is no● Overcast, then play is yes● Rainy, then play is yes
● Second solution –If Humidity is:
● High, then play is no● Normal, then play is yes
9
![Page 10: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/10.jpg)
Dealing with numeric attributes
64 65 68 69 70 71 72 72 75 75 80 81 83 85Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
……………
YesFalse8075Rainy
YesFalse8683Overcast
NoTrue9080Sunny
NoFalse8585Sunny
PlayWindyHumidityTemperatureOutlook
● Discretize numeric attributes● Divide each attribute’s range into intervals● Sort instances according to attribute’s values● Place breakpoints where class changes (majority class)● This minimizes the total error● Example: temperature from weather data
10
![Page 11: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/11.jpg)
The problem of overfitting
64 65 68 69 70 71 72 72 75 75 80 81 83 85Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
64 65 68 69 70 71 72 72 75 75 80 81 83 85Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No
● This procedure is very sensitive to noise● One instance with an incorrect class label will probably
produce a separate interval● Also: time stamp attribute will have zero errors● Simple solution:
◆ enforce minimum number of instances in majority class per interval
● Example (with min = 3):
11
![Page 12: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/12.jpg)
With overfitting avoidance
0/1> 95.5 Yes
3/6True No*
5/142/8False YesWindy
2/6> 82.5 and 95.5 No
3/141/7 82.5 YesHumidity
5/14
4/14
Total errors
2/4> 77.5 No*
3/10 77.5 YesTemperature
2/5Rainy Yes
0/4Overcast Yes
2/5Sunny NoOutlook
ErrorsRulesAttribute
● The final solution -If Humidity is:
● <= 82.5, then play is yes● > 82.5 and <= 95.5, play is no● > 95.5, play is yes 12
![Page 13: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/13.jpg)
Discussion of 1R
Robert Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets”, Machine Learning, 11:63-91, 1993.
● 1R was described in a paper by Holte (1993)● Contains an experimental evaluation on 16 datasets
(using cross-validation so that results were representative of performance on future data)
● Minimum number of instances was set to 6 after some experimentation
● 1R’s simple rules performed not much worse than much more complex decision trees
● Simplicity first pays off!
13
![Page 14: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/14.jpg)
Decision Trees● Found in various applications such as product
recommendation
14
One example ‐ Netflix
![Page 15: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/15.jpg)
Decision Trees● Decision tree
◆ A flow-chart-like tree structure◆ Internal node denotes a test on an attribute◆ Branch represents an outcome of the test◆ Leaf nodes represent class labels or class
distribution● Use of decision tree: Classifying an unknown
sample◆ Test the attribute values of the sample against
the decision tree
15
![Page 16: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/16.jpg)
Decision Trees
16
age?
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
31..40
![Page 17: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/17.jpg)
Learning Decision Trees From Data● Strategy: top down ● Recursive divide-and-conquer fashion
◆ First: select attribute for root nodeCreate branch for each possible attribute value
◆ Then: split instances into subsetsOne for each branch extending from the node
◆ Finally: repeat recursively for each branch, using only instances that reach the branch
● Stop if all instances have the same class
17
![Page 18: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/18.jpg)
Attribute Selection by Information Gain Computation
attribute1 attribute2 class label
high high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
18
![Page 19: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/19.jpg)
Attribute Selection by Information Gain Computation
attribute1 attribute2 class label
high high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
Consider the attribute1:
attribute1 yes nohigh 6 1low 0 5
attribute2 yes nohigh 3 3low 3 3
Consider the attribute2:
19
![Page 20: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/20.jpg)
Attribute Selection by Information Gain Computation
attribute1 attribute2 class label
high high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
Consider the attribute1:
attribute1 yes nohigh 6 1low 0 5
attribute2 yes nohigh 3 3low 3 3
Consider the attribute2:
attribute1 is better than attribute2 for classification purpose !
20
![Page 21: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/21.jpg)
Which attribute to select?
21
![Page 22: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/22.jpg)
Which attribute to select?
22
![Page 23: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/23.jpg)
Criterion for attribute selection
● Which is the best attribute?● Want to get the smallest tree● Heuristic: choose the attribute that produces the
“purest” nodes● Popular impurity criterion: information gain● Information gain increases with the average purity of
the subsets● Strategy: choose attribute that gives greatest
information gain
23
![Page 24: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/24.jpg)
Computing Information● Measure information in bits● Given a probability distribution, the info required to
predict an event is the distribution’s entropy● Entropy gives the information required in bits
(can involve fractions of bits!)● Formula for computing the entropy:
entropy , , ⋯ log log ⋯ log
24
![Page 25: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/25.jpg)
Example: attribute OutlookOutlook = Sunny :
Note: thisis normallyundefined.
Outlook = Overcast :
Outlook = Rainy :
Expected information for attribute:
info 2,3 entropy 25 ,35
25 log
25
35 log
35 0.971bits
info 4,0 entropy 1,0 1 log 1 0 log 0 0bit
info 3,2 entropy 35 ,25
35 log
35
25 log
25 0.971bits
info 2,3 , 4,0 , 3,2 514 0.971 4
14 0 514 0.971 0.693bits
25
![Page 26: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/26.jpg)
Computing Information GainInformation gain: information before splitting – information after splitting
Information gain for attributes from weather data:
gain(Outlook ) = 0.247 bitsgain(Temperature ) = 0.029 bitsgain(Humidity ) = 0.152 bitsgain(Windy ) = 0.048 bits
gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2])= 0.940 – 0.693= 0.247 bits
26
![Page 27: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/27.jpg)
Continuing to split
gain(Temperature ) = 0.571 bitsgain(Humidity ) = 0.971 bitsgain(Windy ) = 0.020 bits
27
![Page 28: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/28.jpg)
Final Decision Tree
Note: not all leaves need to be pure; sometimes identical instances have different classes Splitting stops when data can’t be split any further
28
![Page 29: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/29.jpg)
Wish list for a purity measure● Properties we require from a purity measure:
◆ When node is pure, measure should be zero◆ When impurity is maximal (i.e. all classes equally
likely), measure should be maximal◆ Measure should obey multistage property (i.e.
decisions can be made in several stages):
● Entropy is the only function that satisfies all three properties!
measure 2,3,4 measure 2,7 79 measure 3,4
29
![Page 30: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/30.jpg)
Properties of the entropy
The multistage property:
Simplification of computation:
Note: instead of maximizing info gain we could just minimize information
2 log2 3 log3 4 log4 9 log9 9
entropy , , entropy , entropy ,
info 2,3,4 29 log 2 9
39 log 3 9
49 log 4 9
30
![Page 31: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/31.jpg)
Highly‐branching attributes● Problematic: attributes with a large number of values
(extreme case: ID code)● Subsets are more likely to be pure if there is a large
number of values● Information gain is biased towards choosing attributes
with a large number of values● This may result in overfitting (selection of an attribute
that is non-optimal for prediction)● Another problem: fragmentation
31
![Page 32: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/32.jpg)
Weather data with ID code
N
M
L
K
J
I
H
G
F
E
D
C
B
A
ID code
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHighHotSunny
NoFalseHighHotSunny
PlayWindyHumidityTemp.Outlook
32
![Page 33: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/33.jpg)
Tree stump for ID code attribute
Entropy of split:
Implies that information gain is maximal for ID code(namely 0.940 bits)
info info 0,1 info 0,1 ⋯ info 0,1 0bit
33
![Page 34: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/34.jpg)
Gain ratio● Gain ratio: a modification of the information gain that
reduces its bias● Gain ratio takes number and size of branches into
account when choosing an attribute◆ It corrects the information gain by taking the
intrinsic information of a split into account● Intrinsic information: entropy of distribution of
instances into branches (i.e. how much info do we need to tell which branch an instance belongs to)
34
![Page 35: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/35.jpg)
Computing the gain ratioExample: intrinsic information for ID code
Value of attribute decreases as intrinsic information gets larger
Definition of gain ratio:
Example:
info 1,1,⋯ , 1 14 114 log 1
14 3.807bits
gain_ratio attributegain attribute
intrinsic_info attribute
gain_ratio IDcode0.940bits3.807bits 0.246
35
![Page 36: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/36.jpg)
Gain ratios for weather data
0.019Gain ratio: 0.029/1.5570.157Gain ratio: 0.247/1.577
1.557Split info: info([4,6,4])1.577 Split info: info([5,4,5])
0.029Gain: 0.940-0.9110.247Gain: 0.940-0.693
0.911Info:0.693Info:
TemperatureOutlook
0.049Gain ratio: 0.048/0.9850.152Gain ratio: 0.152/1
0.985Split info: info([8,6])1.000 Split info: info([7,7])
0.048Gain: 0.940-0.8920.152Gain: 0.940-0.788
0.892Info:0.788Info:
WindyHumidity
36
![Page 37: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/37.jpg)
More on the gain ratio● “Outlook” still comes out top● However: “ID code” has greater gain ratio
◆ Standard fix: ad hoc test to prevent splitting on that type of attribute
● Problem with gain ratio: it may overcompensate◆ May choose an attribute just because its intrinsic
information is very low◆ Standard fix: only consider attributes with greater
than average information gain
37
![Page 38: Basic Learning Methods: 1R, Decisionseem5470/lecture/Basic-1R-Tree-2017.pdfSimplicity first Simple algorithms sometimes work well! There are many kinds of simple structure, e.g. One](https://reader033.fdocuments.us/reader033/viewer/2022041916/5e69a1e0680aad57ea3b4de3/html5/thumbnails/38.jpg)
Discussion● Top-down induction of decision trees: ID3, algorithm
developed by Ross Quinlan◆ Gain ratio just one modification of this basic
algorithm◆ C4.5: deals with numeric attributes, missing
values, noisy data● There are many other attribute selection criteria!
(But little difference in accuracy of result)
38