Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also...

Decision Trees

MS Algorithms

Decision Trees

The basic idea –creating a series of splits, also called nodes, in the tree.

The algorithm adds a node to the model every time an input column is found to be significantly correlated with the predictable column.

Predicting Discrete Columns

More on Decision Trees Each internal node (including the root)

represent a test Each leaf nodes represent a class.

age Yes31..40

credit> 40

Yes

NoFair

excellent

Student

Yes

Noy

n<= 30

Some facts Probably the most popular data mining algorithm We have been using it without knowing it A path from the root to a leaf note forms a rule Predication is efficient Shapes and sizes can be controlled C4.5 can handle numeric attributes, missing values, and

noisy data MS called their algorithm Decision Trees because

Combines many different algorithms The model may generate many trees

Predicate a nest column Predicate a many columns Predicate a continuously column

Growing the tree

1. Correlation on each attribute over the prediction

For example, IQ can be H, M, and L Each with a count for attending college or not

2. Select a internal node based on, say, entropy calculation.

3. Recursively work on each possible branch until all the attributes are considered

Entropy

The entropy concept was developed from the study of thermodynamic systems

It states that for any irreversible process, entropy always increases. Entropy is a measure of disorder. So the second law states that any

irreversible process, the disorder in the universe increases. So the smaller the entropy, the better. Since virtually all natural processes are irreversible, the entropy law implies

that the universe is "running down". Order, patterns, structure, all gradually disintegrate into random disorder.

The direction of time is from order to chaos.

Characteristic If a case determines, it has a value of zero If the in and out states are equal, it returns

the max In the case of multiple states, different

calculation should result in the same In the IQ H, M, L case, you can start with

H and Not H, then M or L, or H, M, L, the result should be the same.

Steps1. Build a correlation count table

2. Calculate entropy (or other measurement)

Examples in book What the book meant by

Entropy (700, 400)

Check all

Why pick the one with the lowest entropy?

400 0.363636 -1.45943 -0.5307

700 0.636364 -0.65208 -0.41496

1100 0.94566

Issue with attribute of many states Zip code has many state

IgnoreKeep the same so the tree is not very goodGroup

Locations Characteristics

Population Beach access Economic

Over Training

The size of the tree has no direct relation to the quality of the prediction

A big tree sometime only reflects the training data – this is called over training and should be avoided

ParametersParameter Description

MAXIMUM_INPUT_ATTRIBUTES Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.The default is 255.

MAXIMUM_OUTPUT_ATTRIBUTES Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.The default is 255.

SCORE_METHOD Determines the method that is used to calculate the split score. Available options: Entropy (1), Bayesian with K2 Prior (2), or Bayesian Dirichlet Equivalent (BDE) Prior (3).The default is 3.

SPLIT_METHOD Determines the method that is used to split the node. Available options: Binary (1), Complete (2), or Both (3).The default is 3.

MINIMUM_SUPPORT Determines the minimum number of leaf cases that is required to generate a split in the decision tree.The default is 10.

COMPLEXITY_PENALTY Controls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list: •For 1 through 9 attributes, the default is 0.5.•For 10 through 99 attributes, the default is 0.9.•For 100 or more attributes, the default is 0.99.

FORCED_REGRESSOR Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.

Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also...

Documents

Transcript of Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also...