Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Decision Trees with Minimal CostsDecision Trees with Minimal Costs(ICML 2004, Banff, Canada)(ICML 2004, Banff, Canada)

Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong KongJianning Wang, Univ of Western Ontario, CanadaShichao Zhang, UTS, Australia

Contact: cling@csd.uwo.ca

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Costs in Machine LearningCosts in Machine Learning

Most inductive learning algorithms: minimizing classification errors– Different types of misclassification have

different costs, e.g. FP and FN

In this talk: – Test costs should also be considered– Cost sensitive learning considers a variety of

costs; see survey by Peter Turney (2000)

ApplicationsApplications

Medical Practice– Doctors may ask a patient to go through a

number of tests (e.g., Blood tests, X-rays)– Which of these new tests will bring about

higher value?

Biological Experimental Design– When testing a new drug, new tests are costly– which experiments to perform?

Previous WorkPrevious WorkMany previous works consider the two types

of cost separately – an obvious oversight(Turney 1995): ICET, uses genetic algorithm

to build trees to minimize the total cost(Zubek and Dieterrich 2002): a Markov

Decision Process (MDP), searches in a state space for optimal policies

(Greiner et al. 2002): PAC learning

An Example of Our ProblemAn Example of Our Problem

Training: with ?, cannot obtain valuesIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

12 101 ? H ? … Yes

23 ? L M L … No

Test: with many ?, may obtain values at a costIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

45 98 ? ? ? … ?

58 ? ? ? ? … ?

Goal 1: build a tree that minimizes

the total cost

Goal 2: obtain test values at a cost to minimize the total

OutlineOutline

Building Trees with Minimal Total CostsBuilding Trees with Minimal Total Costs

Assumption: binary classes, costs: FP and FNGoal: minimize total cost

– Total cost = misclassification cost + test cost

Previous Work– Information Gain as a attribute selection criterion

In this work, need a new attribute selection criterion

Attribute Selection Criterion: C4.5Attribute Selection Criterion: C4.5

Minimal total cost (C4.5: minimal entropy)– If growing a tree has a smaller total cost

then choose an attribute with minimal total costelse stop and form a leaf

Label leaf according to minimal total costIf (P×FN N×FP)

then class = positiveelse class = negative

First, how to handle ? values in training data

Previous work – built ? branch; – problematic

This work– deal with unknown values in the training set:– no branch for ? will be built, – examples are “gathered” inside the internal

Difference on Difference on ?? values values

Desirable PropertiesDesirable Properties

1. Effect of difference between misclassification costs and the test costs

P N P N P P

All test costs are 20

P P P P

P N P NN N

2. Prefer attribute with smaller test costs

A1 A2 A3 A4 A5 A6

# 1 20 20 20 20 20 20

# 2 200 20 100 100 200 200

# 3 200 100 100 100 20 200

P P P P

P N P NN N

P N N PP P

3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree

Cost of A1=20

P P P P

P N P NN N

Cost of A1=50

N PN P

Cost of A1=80

OutlineOutline

Missing values in test casesMissing values in test cases

Blood test X-ray result

Urine test S-test

? good ? ?

A New patient arrives:

OST: IntuitionOST: Intuition

Explain the intuition of OST here

Four Testing StrategiesFour Testing Strategies

First: Optimal Sequential Test (OST)(Simple batch test: do all tests)

Second: No test will be performed, predict with internal node

Third: No test will be performed, predict with weighted sum of subtrees

Fourth: A new tree is built dynamically for each test case using only the known attributes

P P P P

P N P NN N

P N P N P P

OutlineOutline

Experiment - settingsExperiment - settings

Five dataset, binary-class60/40 for training/testing, repeat 5 timesUnknown values for training/test examples are

selected randomly by a specific probability Also compare to C4.5 tree, using OST for testing

Results with different % of unknownResults with different % of unknown

20 40 60 80

P ercentage of unknown attributes

M1 (OST)

No test, internal

C4.5 tree, OST

No test, lazy tree

No test, distributed

OST is best; M4 and C4.5 next; M3 is worst OST not increase with more ?; others do overall

50 100 200 400

Test costs

M1 (OST)

Results with different test costsResults with different test costs

No test, internal

C4.5 tree, OST

No test, lazy tree

With large test costs, OST = M2 = M3 = M4 C4.5 is much worse (tree building is cost-insensitive)

50 100 200 400Test costs

M1 (OST)

Results with unbalanced class costsResults with unbalanced class costs

No test, internal

C4.5 tree, OST

No test, lazy tree

With large test costs, OST = M2 = M4 C4.5 is much worse (tree building is cost-insensitive) M3 is worse than M2… (M3 is used in C4.5)

Comparing OST/C4.5 cross 6 datasetsComparing OST/C4.5 cross 6 datasets

OST always outperforms C4.5

00.10.20.30.40.50.60.70.80.9

20 40 60 80

(a) P ercentage of unknown attributes

Ecoli Breast Heart Thyroid Australia

50 100 200 400

(b) Test costs

Ecoli Breast Heart Thyroid Australia

OutlineOutline

ConclusionsConclusions

New tree building algorithm for minimal costs– Desirable properties – Computationally efficient (similar to C4.5)

Test strategies (OST and batch) are very effective

Can solve many real-world diagnosis problems

Future WorkFuture Work

More intelligent “Batch Test” methodsConsider cost of additional batch test

– Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …

Other learning algorithms with minimal total cost

A wrapper that works for any “black box”

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Documents

Transcript of Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

References - CERN€¦ · Accessed 2003; Tutorial given at ICML 2004 international conference on machine learning, Banff, Alberta, Canada, 2004 J. Balakrishnan, Spatial curvature

Banff Extended

LEGEND - Banff Sunshine

by Sergey Alexandrov iCML 2003

Banff National ParkParc national Banff Public Transit ...

Raskar Banff

Banff DuPont Presentation

International Bioacoustics Congress 2017 Banff National Park Banff, Alberta, Canada.

American Association 15 ICML Preliminary...3 Welcome to 15 –ICML The International Conference on Malignant Lymphoma (ICML) has become, since its first edition in 1981, a must-attend

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,

ICML Conference Belgrade, 2015

ICML-Basu02-Semi Supervised Clustering by Seeding

September-October 2016 NEWS FOR SKIERS AND … · Celebrate Valentine’s Day in BANFF February 11-18, 2017 Cost Full package: $1,600 ... Banff Caribou, 521 Banff Ave, Banff AB Canada,

ICML ’11 Tutorial: Recommender Problems for Web Applicationspages.cs.wisc.edu/~beechung/icml11-tutorial/ICML-Recommender... · Recommender Problems for Web Applications ... –

2016-2017 - Banff Canmore Community Foundation€¦ · of Banff, Fairmont Banff Springs Hotel, Banff Centre, Banff & Lake Louise Tourism and the Banff Canmore Community Foundation.

R.A.F. BANFF

ICML-Tutorial, Banff, Canada, 2004 Kristian Kersting University of Freiburg Germany „Application of Probabilistic ILP II“, FP6-508861 .

WT Program Template-2014 AK - Banff Mountain Film Festivalbanff.nz/banff/wp-content/uploads/2015/02/...To Banff For the Banff Mountain Film and Book Festival October 31 – November

Banff Highly Sensitized Working Groupsctransplant.org/sct2017/docs/presentations/2803/BANFFWorkingGr… · BANFF HIGHLY SENSITIZED WORKING GROUP UPDATE Banff Conference on Allograft

Looking Back: Reflections on Banff 1991 From Banff 2003