Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

43
Chapter 8 Chapter 8 Decision Tree Algorithms Decision Tree Algorithms Rule Based Suitable for automatic generation

Transcript of Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

Page 1: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

Chapter 8Chapter 8Decision Tree AlgorithmsDecision Tree Algorithms

Rule Based

Suitable for automatic generation

Page 2: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-2

ContentsContents

Presents the concept of decision tree models

Discusses the concept rule interestingness

Demonstrates decision tree rules on a case

Review real applications of decision tree models

Shows the application of decision tree models to larger data sets

Demonstrates See5 decision tree analysis in the appendix

Page 3: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-3

ProblemsProblems

Grocery stores has a massive data problem in inventory control, dealt with a high degree by bar-coding.The massive data database of transactions can be mined to monitor customer demand.Decision trees provide a means to obtain product-specific forecasting models in the form of rules (IF-THEN) that are easy to implement.Decision-tree can be used by grocery stores in a number of policy decisions, including ordering inventory replenishment and evaluating alternative promotion campaigns.

Page 4: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-4

Decision treeDecision tree

Decision tree refers to the tree structure of rules (often association rules).The decision tree modeling process involves collecting those variables that the analyst thinks might bear on the decision at issue, and analyzing these variables for their ability to predict the outcome.The algorithm automatically determines which variables are most important, based on their ability to sort the data into the correct output category.Decision tree has a relative advantage over ANN and GA in that a reusable set of rules are provided, this explaining model conclusions.

Page 5: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-5

Decision Tree ApplicationsDecision Tree Applications

Classifying loan applications, screening potential consumers, and rating job applicants.

Decision tree provide a way to implement rule-based system approach. 監督式學習模型

樹型結構 關聯模式 規則誘發

決策樹(類別型屬性)

迴歸樹(連續型屬性)

C4.5/C5ID3

CART CART

M5

CN2

ITRULE

Cubist

監督式學習模型

樹型結構 關聯模式 規則誘發

決策樹(類別型屬性)

迴歸樹(連續型屬性)

C4.5/C5ID3

CART CART

M5

CN2

ITRULE

Cubist

Page 6: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-6

Types of TreesTypes of Trees

Classification treeVariable values classesFinite conditions

Regression treeVariable values continuous numbersPrediction or estimation

Page 7: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-7

Rule InductionRule Induction

Automatically process dataClassification (logical, easier)Regression (estimation, messier)

Search through data for patterns & relationshipsPure knowledge discovery

Assumes no prior hypothesisDisregards human judgment

Page 8: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-8

Decision treesDecision trees

Logical branching

Historical:ID3 – early rule- generating systemC4.5See 5

Branches:Different possible values

Nodes:From which branches emanate (spray)

Page 9: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-9

Decision tree operationDecision tree operation

A bank may have a database of pass loan applicants for short-term loans. (see table. 4.4)

The bank’s policy treats applicants differently by age group, income level, and risk.

A tree sorts the possible combinations of these variables. An exhaustive tree enumerates all combinations of variable values in Table. 8.1

Page 10: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-10

Decision tree operationDecision tree operation

Page 11: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-11

Decision tree operationDecision tree operation

A rule-based system model would require that bank loan officers who had respected judgment be interviewed to classify the decision for each of these combinations of variables.

Some situations can be directly reduced in decision tree.

Page 12: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-12

Rule interestingnessRule interestingness

Data, even categorical data can potentially involve many rules.

See Table. 8.1, 333=27 combinations. If 10 variables and each with 4 possible values(410=1,048,576), the combinations become over a million. unreasonable

Decision tree models identify the most useful rules in terms of predicting outcomes. Rule effectiveness is measured in terms of confidence and support.Confidence is the degree of accuracy of a ruleSupport is the degree to which the antecedent

condition occur in the data.

Page 13: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-13

Tanagra ExampleTanagra Example

Page 14: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-14

Support & ConfidenceSupport & Confidence

Support for an association rule indicates the proportion of records covered by the set of attributes in the association rules.

Example: If there were 10 million book purchases, support for the given rule would be 10/10,000,000, a very small support measure of 0.000001. These concepts are often used in the form of threshold levels in machine learning systems.

Minimum confidence levels and support levels can be specified to retain rules identified by decision tree.

Page 15: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-15

Machine learningMachine learning

Rule-induction algorithms can automatically process categorical data (also can work on continuous data). A clear outcome is needed.

Rule induction works by searching through data for patterns and relationships.

Machine learning starts with on assumptions, looking only at input data and results.

Recursive partitioning algorithms split data (original data) into finer and finer subsets leading to a decision tree.

Page 16: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-16

CasesCases

20 past loan application cases in Table 8.3.

Page 17: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-17

CasesCases

Automatic machine learning begins with identifying those variables that offer the great likelihood of distinguishing between the possible outcomes.

For each of the three variables, the outcome probabilities illustrated in Table 8.5 (next slide)

Most data mining packages use an entropy measure to gauge the discriminating power of each variable (split data) (Chi-square measures can also be used to select variables)

Page 18: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-18

CasesCasesTable 8.4 Grouped data

Table 8.5 Combination outcomes

Page 19: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-19

Entropy formulaEntropy formula

Where p is the number of positive examples and n is the number of negative examples in the training set for each value of the attribute.

The lower the measure (entropy), the greater the information content

Can use to automatically select variable with most productive rule potential

np

nlog

np

n-

np

plog

np

p-Inform 22

Page 20: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-20

Entropy formulaEntropy formula

The entropy formula has a problem if either p or n are 0, then the log2 is undefined.Entropy for each Age category generated by the formula is shown in Table 8.6.

Category Young: [-(8/12)(-0.585)-(4/12)(-1.585)](12/20)=0.551The lower entropy measure, the greater the information content (the greater the agreement probability)Rule: If (Risk=low) then Predict on-time payment

Else predict late

np

nlog

np

n-

np

plog

np

p-Inform 22

× + × ×

Page 21: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-21

EntropyEntropy

Young- 8/12 × -0.390 – 4/12 × -0.528 × 12/20: 0.551

Middle- (4/5 × -0.322) – (1/5 x -2.322) × 5/20: 0.180

Old- 3/3 × 0 – 0/3 × 0 × 3/20: 0.000

SUM 0.731Income 0.782Risk 0.446 (lowest)

By the measures, Risk has the greatest information content. If Risk is low, the data indicates a 1.0 probability that the applicant will pay the loan back on time.

Page 22: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-22

EvaluationEvaluation

Two type errors may occur:1. Those applicants rated as low risk actually not pay on

time (from the data, the probability of this case is 0.0)

2. Those applicants rated as high or average risk may actually have paid if given a loan. (from the data, the probability of this happening is 0.25. 5/20=0.25)

Expected error 0.250.5 (the p of being wrong)=0.125 Test the model using another set of data.

Page 23: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-23

EvaluationEvaluation

The entropy formula for Age, given the risk was not low 0.99, while the same calculation for income is 1. 971. Age has greater discriminating power.

If age is middle, the one case did not pay one time.

If (Risk is Not low) AND (Age=Middle)

Then Predict late

Else Predict On-time

Page 24: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-24

EvaluationEvaluation

For last variable, income, give that Risk was not low, and Age was not Middle, there are nine cases left and shown in Table 8.8.

A third rule takes advantage of the case with u’nanimous (agree) outcome is:

If (Risk Not low) AND (Age NOT middle) AND (income high)

Then predict Late

Else Predict on-time

See page 141 for more explanations

Page 25: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-25

Rule accuracyRule accuracy

The expected accuracy of the three rules is shown in Table 8.9.

The expected error is 8/20=0.375 (1-0.625)

An additional rule could be generated. For the case of Risk not low, Age=Young, and Income Not high (four cases with low income (p of on-time) =0.5 and four cases with average income (p of on-time = 0.75)

The greater discrimination is provided by average income, resulting in the following rule:

If (Risk not low) and (Age not middle) and (income average)

Then predict on-time

Else predict either

Page 26: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-26

Rule accuracyRule accuracy

There is no added accuracy obtained with this rule, shown in Table 8.10.

The expected error is 4/20 times 0.5 = 0.15 the same without the rule.

When machine learning methods encounter no improvement, they generally stop.

Page 27: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-27

Rule accuracyRule accuracy

Table 8.11 shows the results.

Page 28: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-28

Rule accuracyRule accuracy

Page 29: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-29

Inventory PredictionInventory Prediction

GroceriesMaybe over 100,000 SKUsBarcode data input

Data mining to discover patternsRandom sample of over 1.6 million records30 months95 outletsTest sample 400,000 records

Rule induction more workable than regression28,000 rulesVery accurate, up to 27% improvement

Page 30: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-30

Clinical DatabaseClinical Database

HeadacheOver 60 possible causes

Exclusive reasoning uses negative rulesUse when symptom absent

Inclusive reasoning uses positive rules

Probabilistic rule induction expert systemHeadache: Training sample over 50,000 cases, 45

classes, 147 attributesMeningitis(腦膜炎 ): 1200 samples on 41 attributes, 4

outputs

Page 31: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-31

Clinical DatabaseClinical Database

Used AQ15, C4.5Average accuracy 82%

Expert SystemAverage accuracy 92%

Rough Set Rule SystemAverage accuracy 70%

Using both positive & negative rules from rough setsAverage accuracy over 90%

Page 32: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-32

Software Development QualitySoftware Development Quality

Telecommunications company

Goal: find patterns in modules being developed likely to contain faults discovered by customersTypical module several million lines of codeProbability of fault averaged 0.074

Apply greater effort for thoseSpecification, testing, inspection

Page 33: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-33

Software QualitySoftware Quality

Preprocessed dataReduced dataUsed CART (Classification & Regression Trees)Could specify prior probabilities

First model 9 rules, 6 variablesBetter at cross-validationBut variable values not available until late

Second model 4 rules, 2 variablesAbout same accuracy, data available earlier

Page 34: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-34

Rules and evaluationRules and evaluation

Page 35: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-35

Rules and evaluationRules and evaluation

The Second models rules

Two models were very close in accuracy. The fist model was better at cross validation accuracy, but its variables were available just prior to release.The second model had the advantage of being based on data available at an earlier state and required less extensive data reduction.See also, page .146 for expert system

Page 36: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-36

Applications of methods to larger data setsApplications of methods to larger data sets

Expenditure application to find the characteristics of potential customers for each expenditure category.

A simple case is to categorize clothing expenditures (or other expenditures in the data set) per year as a 2-class classification problem.

Data preparation data transformation, see page 154

Comparisons of A-prioriC4.5C5.0

Page 37: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-37

Fuzzy Decision TreesFuzzy Decision Trees

Have assumed distinct (crisp) outcomes

Many data points not that clear

Fuzzy: Membership function represents belief (between 0 and 1)

Fuzzy relationships have been incorporated in decision tree algorithms

Page 38: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-38

Fuzzy ExampleFuzzy Example

Age Young 0.3 Middle 0.9 Old 0.2

Income Low 0.0 Average 0.8 High 0.3

Risk Low 0.1 Average 0.8 High 0.3

Definitions:Sum will not necessarily equal 1.0If ambiguous, select alternative with larger

membership valueAggregate with mean

Page 39: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-39

Fuzzy ModelFuzzy Model

IF Risk=Low Then OTMembership function: 0.1

IF Risk NOT Low & Age=Middle Then LateRisk MAX(0.8, 0.3)Age 0.9Membership function: Mean = 0.85

Page 40: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-40

Fuzzy Model cont.Fuzzy Model cont.

IF Risk NOT Low & Age NOT Middle & Income=High THEN LateRisk MAX(0.8, 0.3) 0.8Age MAX(0.3, 0.2) 0.3Income 0.3Membership function: Mean = 0.433

Page 41: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-41

Fuzzy Model cont.Fuzzy Model cont.

IF Risk NOT Low & Age NOT Middle & Income NOT High THEN LateRisk MAX(0.8, 0.3) 0.8Age MAX(0.3, 0.2) 0.3Income MAX(0.0, 0.8) 0.8Membership function: Mean = 0.633

Page 42: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-42

Fuzzy Model cont.Fuzzy Model cont.

Highest membership function is 0.633, for Rule 4

Conclusion: On-time

Page 43: Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.

結束

8-43

Decision TreesDecision Trees

Very effective & useful

Automatic machine learningThus unbiased (but omit judgment)

Can handle very large data setsNot affected much by missing data

Lots of software available