DECISION TREES. Decision trees One possible representation for hypotheses.
Decision Trees
description
Transcript of Decision Trees
1
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno
DECISION TREES
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 2
Resources
Artificial Intelligence, 3rd Edition,Patrick Henry Winston, Ch. 21 http://www.cse.unr.edu/~
sushil/class/games/notes/ch21.pdf Artificial Intelligence: A Modern
Approach, 3rd Edition, Russell , Norvig, Ch. 18.3,pg. 531-554
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 3
Identification Tree
Type of Decision Tree The Winston book call their methods
SPROUTER and PRUNER, but it’s basically simplified example of an algorithm called ‘ID3’
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 4
Identification Tree
Name (Sample ID)
Hair Height Weight Lotion
Result
Sarah Blonde Average Light No SunburnedDana Blonde Tall Average Yes NoneAlex Brown Short Average Yes NoneAnnie Blonde Short Average No SunburnedEmily Red Average Heavy No SunburnedPete Brown Tall Heavy No NoneJohn Brown Average Heavy No NoneKatie Blonde Short Light Yes None
Sunburn Dataset Select one attribute to be predicted/identified All other attributes used to identify the selected
target attributed, or classification
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 5
Identification Tree Predict Sunburns
More than one tree can correctly identify the dataset Some trees generalize information better
Smaller trees tend to be better (Occam’s Razor) The smallest identification tree consistent with the samples is the one most
likely to identify unknown objects correctly How to we construct the smallest/‘best’ tree?
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 6
Identification Tree
Computationally impractical to find the smallest tree when many tests are required Use a procedure that builds small trees,
but is NOT guaranteed to build the SMALLEST possible tree.
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 7
Identification Tree
Split the samples based on the best attribute A single attribute that comes closest to correctly grouping
the samples based on the target classification Number of samples in homogeneous sets
4 2
0 3
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 8
Identification Tree
Select best attribute, and repeat with remaining attributes Must repeat for each heterogeneous branch
Only split the samples that went down that branch The next attribute you select for one branch may be different from
the attribute you select for another branch, even if they share the same parent node
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 9
Identification Tree
In real data, unlikely to get ANY homogeneous branches Need a measure of inhomogeneity/disorder/entropy
Minimize disorder/entropy (or maximize Information Gain) Many different measurements/calculations that can be used Example: Entropy(S)
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 10
Identification Tree
Results using new disorder measurementHair Attribute Disorder
Calculation
All Disorder Calculations (first Node)
All Disorder Calculations (second Node)
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 11
Identification Tree
Information Gain Expected reduction in entropy due to
sorting Sample Set S on attribute A
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 12
Identification Tree
SPROUTER algorithm
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 13
Tree to Rules
Each path, from root to leaf, is a rule The value of each attribute node are the
antecedents The leaf value is the consequence
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 14
Simplify Rules
For each rule, drop antecedents if it won’t change what the rule does on all the samples
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 15
Eliminate Rules
Once all individual rules have been simplified, you can eliminate unnecessary rules Create a “default rule” eliminates the most rules
In the event of a tie, make up some metric to break the tie Examples:
Covers the most common consequent in the sample setLeaves the simplest rules
Most common consequent
Simplest rules
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 16
Eliminate Rules
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 17
Decision Tree Algorithms
ID3 (Iterative Dichotomiser 3) Gets stuck on local optimums, Greedy Not good on attributes with continuous values
C4.5/J4.8 Extension of ID3 Better handling of attributes with continuous values Can handle training data where some attribute values
are missing/unknown Handling attributes with different costs Pruning Tree after creation
C5.0/See5.0 Commercial, closed-source Not talking about this, but it exists
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 18
C4.5
Pruning Helps avoid over fitting Prepruning
Deciding not to split a set of sample any further based on some heuristic, during tree construction
Usually based on some statistical test Chi-squared
Postpruning Subtree Replacement Subtree Raising
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 19
C4.5
Continuous Values For an attribute with continuous values,
sort all samples based on that attribute Mark a ‘split point’ between samples
where the classification changes Calculate information gain on all split
points Select split point with highest
information gain and use for that attribute