Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also...
-
Upload
alice-owens -
Category
Documents
-
view
213 -
download
0
Transcript of Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also...
Decision Trees
MS Algorithms
Decision Trees
The basic idea –creating a series of splits, also called nodes, in the tree.
The algorithm adds a node to the model every time an input column is found to be significantly correlated with the predictable column.
Predicting Discrete Columns
More on Decision Trees Each internal node (including the root)
represent a test Each leaf nodes represent a class.
age Yes31..40
credit> 40
Yes
NoFair
excellent
Student
Yes
Noy
n<= 30
Some facts Probably the most popular data mining algorithm We have been using it without knowing it A path from the root to a leaf note forms a rule Predication is efficient Shapes and sizes can be controlled C4.5 can handle numeric attributes, missing values, and
noisy data MS called their algorithm Decision Trees because
Combines many different algorithms The model may generate many trees
Predicate a nest column Predicate a many columns Predicate a continuously column
Growing the tree
1. Correlation on each attribute over the prediction
For example, IQ can be H, M, and L Each with a count for attending college or not
2. Select a internal node based on, say, entropy calculation.
3. Recursively work on each possible branch until all the attributes are considered
Entropy
The entropy concept was developed from the study of thermodynamic systems
It states that for any irreversible process, entropy always increases. Entropy is a measure of disorder. So the second law states that any
irreversible process, the disorder in the universe increases. So the smaller the entropy, the better. Since virtually all natural processes are irreversible, the entropy law implies
that the universe is "running down". Order, patterns, structure, all gradually disintegrate into random disorder.
The direction of time is from order to chaos.
Characteristic If a case determines, it has a value of zero If the in and out states are equal, it returns
the max In the case of multiple states, different
calculation should result in the same In the IQ H, M, L case, you can start with
H and Not H, then M or L, or H, M, L, the result should be the same.
Steps1. Build a correlation count table
2. Calculate entropy (or other measurement)
Examples in book What the book meant by
Entropy (700, 400)
Check all
Why pick the one with the lowest entropy?
400 0.363636 -1.45943 -0.5307
700 0.636364 -0.65208 -0.41496
1100 0.94566
Issue with attribute of many states Zip code has many state
IgnoreKeep the same so the tree is not very goodGroup
Locations Characteristics
Population Beach access Economic
Over Training
The size of the tree has no direct relation to the quality of the prediction
A big tree sometime only reflects the training data – this is called over training and should be avoided
ParametersParameter Description
MAXIMUM_INPUT_ATTRIBUTES Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.The default is 255.
MAXIMUM_OUTPUT_ATTRIBUTES Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.The default is 255.
SCORE_METHOD Determines the method that is used to calculate the split score. Available options: Entropy (1), Bayesian with K2 Prior (2), or Bayesian Dirichlet Equivalent (BDE) Prior (3).The default is 3.
SPLIT_METHOD Determines the method that is used to split the node. Available options: Binary (1), Complete (2), or Both (3).The default is 3.
MINIMUM_SUPPORT Determines the minimum number of leaf cases that is required to generate a split in the decision tree.The default is 10.
COMPLEXITY_PENALTY Controls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list: •For 1 through 9 attributes, the default is 0.5.•For 10 through 99 attributes, the default is 0.9.•For 100 or more attributes, the default is 0.99.
FORCED_REGRESSOR Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.