Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

27
Decision Trees with Decision Trees with Minimal Costs Minimal Costs (ICML 2004, Banff, Canada) (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang, Univ of Western Ontario, Canada Shichao Zhang, UTS, Australia Contact: [email protected]

description

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada). Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang, Univ of Western Ontario , Canada Shichao Zhang, UTS, Australia Contact: [email protected]. Outline. Introduction - PowerPoint PPT Presentation

Transcript of Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Page 1: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Decision Trees with Minimal CostsDecision Trees with Minimal Costs(ICML 2004, Banff, Canada)(ICML 2004, Banff, Canada)

Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong KongJianning Wang, Univ of Western Ontario, CanadaShichao Zhang, UTS, Australia

Contact: [email protected]

Page 2: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Page 3: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Costs in Machine LearningCosts in Machine Learning

Most inductive learning algorithms: minimizing classification errors– Different types of misclassification have

different costs, e.g. FP and FN

In this talk: – Test costs should also be considered– Cost sensitive learning considers a variety of

costs; see survey by Peter Turney (2000)

Page 4: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

ApplicationsApplications

Medical Practice– Doctors may ask a patient to go through a

number of tests (e.g., Blood tests, X-rays)– Which of these new tests will bring about

higher value?

Biological Experimental Design– When testing a new drug, new tests are costly– which experiments to perform?

Page 5: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Previous WorkPrevious WorkMany previous works consider the two types

of cost separately – an obvious oversight(Turney 1995): ICET, uses genetic algorithm

to build trees to minimize the total cost(Zubek and Dieterrich 2002): a Markov

Decision Process (MDP), searches in a state space for optimal policies

(Greiner et al. 2002): PAC learning

Page 6: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

An Example of Our ProblemAn Example of Our Problem

Training: with ?, cannot obtain valuesIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

… D

12 101 ? H ? … Yes

23 ? L M L … No

Test: with many ?, may obtain values at a costIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

… D

45 98 ? ? ? … ?

58 ? ? ? ? … ?

Goal 1: build a tree that minimizes

the total cost

Goal 2: obtain test values at a cost to minimize the total

cost

Page 7: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Page 8: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Building Trees with Minimal Total CostsBuilding Trees with Minimal Total Costs

Assumption: binary classes, costs: FP and FNGoal: minimize total cost

– Total cost = misclassification cost + test cost

Previous Work– Information Gain as a attribute selection criterion

In this work, need a new attribute selection criterion

Page 9: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Attribute Selection Criterion: C4.5Attribute Selection Criterion: C4.5

Minimal total cost (C4.5: minimal entropy)– If growing a tree has a smaller total cost

then choose an attribute with minimal total costelse stop and form a leaf

Page 10: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Label leaf according to minimal total costIf (P×FN N×FP)

then class = positiveelse class = negative

Page 11: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

First, how to handle ? values in training data

Previous work – built ? branch; – problematic

This work– deal with unknown values in the training set:– no branch for ? will be built, – examples are “gathered” inside the internal

nodes

Difference on Difference on ?? values values

Page 12: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Desirable PropertiesDesirable Properties

1. Effect of difference between misclassification costs and the test costs

P N P N P P

A1

All test costs are 20

All test costs are 300

P

P P P P

A1

A6 A6

P N P NN N

All test costs are 0

Page 13: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

2. Prefer attribute with smaller test costs

A1 A2 A3 A4 A5 A6

# 1 20 20 20 20 20 20

# 2 200 20 100 100 200 200

# 3 200 100 100 100 20 200

P P P P

A1

A6 A6

P N P NN N

P N

A2

A1

P N N PP P

P P

A5

A1

P N N PP P

Page 14: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree

Cost of A1=20

P P P P

A1

A6 A6

P N P NN N

Cost of A1=50

P N

A6

A1

N PN P

P

Cost of A1=80

P N

A6

A2

P NN

P

Page 15: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Page 16: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Missing values in test casesMissing values in test cases

Blood test X-ray result

Urine test S-test

? good ? ?

A New patient arrives:

Page 17: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OST: IntuitionOST: Intuition

Explain the intuition of OST here

Page 18: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Four Testing StrategiesFour Testing Strategies

First: Optimal Sequential Test (OST)(Simple batch test: do all tests)

Second: No test will be performed, predict with internal node

Third: No test will be performed, predict with weighted sum of subtrees

Fourth: A new tree is built dynamically for each test case using only the known attributes

P P P P

A1

A6 A6

P N P NN N

P N P N P P

A1

Page 19: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Page 20: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Experiment - settingsExperiment - settings

Five dataset, binary-class60/40 for training/testing, repeat 5 timesUnknown values for training/test examples are

selected randomly by a specific probability Also compare to C4.5 tree, using OST for testing

Page 21: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Results with different % of unknownResults with different % of unknown

0

20

40

60

80

100

120

140

160

20 40 60 80

P ercentage of unknown attributes

M1 (OST)

M2

M3

M4

C4.5

No test, internal

C4.5 tree, OST

No test, lazy tree

No test, distributed

OST is best; M4 and C4.5 next; M3 is worst OST not increase with more ?; others do overall

Page 22: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

0

100

200

300

400

500

600

50 100 200 400

Test costs

M1 (OST)

M2

M3

M4

C4.5

Results with different test costsResults with different test costs

No test, internal

C4.5 tree, OST

No test, lazy tree

No test, distributed

With large test costs, OST = M2 = M3 = M4 C4.5 is much worse (tree building is cost-insensitive)

Page 23: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

0

100

200

300

400

500

600

50 100 200 400Test costs

M1 (OST)

M2

M3

M4

C4.5

Results with unbalanced class costsResults with unbalanced class costs

No test, internal

C4.5 tree, OST

No test, lazy tree

No test, distributed

With large test costs, OST = M2 = M4 C4.5 is much worse (tree building is cost-insensitive) M3 is worse than M2… (M3 is used in C4.5)

Page 24: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Comparing OST/C4.5 cross 6 datasetsComparing OST/C4.5 cross 6 datasets

OST always outperforms C4.5

00.10.20.30.40.50.60.70.80.9

20 40 60 80

(a) P ercentage of unknown attributes

Ecoli Breast Heart Thyroid Australia

0

0.2

0.4

0.6

0.8

1

50 100 200 400

(b) Test costs

Ecoli Breast Heart Thyroid Australia

Page 25: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Page 26: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

ConclusionsConclusions

New tree building algorithm for minimal costs– Desirable properties – Computationally efficient (similar to C4.5)

Test strategies (OST and batch) are very effective

Can solve many real-world diagnosis problems

Page 27: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Future WorkFuture Work

More intelligent “Batch Test” methodsConsider cost of additional batch test

– Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …

Other learning algorithms with minimal total cost

A wrapper that works for any “black box”