MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR...

Post on 17-Jan-2016

225 views 0 download

Tags:

Transcript of MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR...

MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING

KIRANKUMAR K. TAMBALKAR

What is Discretization?

Discretization concerns the process of transferring continuous functions, models and equations into discrete values.

This process is usually carried out as first step towards making them suitable for numerical evaluation and implementation on digital computers.

Why Discretization?

The main aim is reduce the number of values of continuous attribute to discrete attribute.

Typically data is discretized into a partitions of K equal lengths/widths (equal intervals).

Discretization

Discretization of continuous-valued attributes. First present the result about the

information entropy minimization. Heuristic for binary discretization (Two-

interval splits) A better understanding of the heuristic and it’s

behavior.

Formal evidence that supports the usage of the heuristic in this context.

Binary Discretization

A continuous-valued attribute is typically discretized during decision tree generation by partitioning its range into two intervals.

Threshold value ‘T’ Continuous attribute ‘A’ is determined and the

test A<=T assigned to the left branch while A>T is assigned to the right branch.

We call such threshold value T, a cut point.

What is Entropy

Entropy. It’s also called Expected Information Entropy. That’s what we call this value which essentially describes how consistently a potential split will match up with a classifier.

Ex: Let’s say we are looking below age of 25. Out of that group how many people can we expect to have an income above 50K or below 50K?

Lower entropy is better, and a 0 value for entropy is the best.

Data set example

Features (f1) Features (f2) Class Labels

Attributes (a1) Attributes (b1) Class Labels

(a2) (b2) Class Labels

(a3) (b3) Class Labels

(a4) (b4) Class Labels

(a5) (b5) Class Labels

(a6) (b6) Class Labels

(a7) (b7) Class Labels

(a8) (b8) Class Labels

(a9) (b9) Class Labels

Algorithm

Binary Discretization We select an attribute for branching at a node having a set

S of N examples. For each continuous-valued attribute A we select the

“best” cut point TA from its range of values by evaluation. First we sort the given set or data into the increasing order

of attribute ‘A’. And the midpoint between the each successive pair of

example in the sorted sequence is evaluated as a potential cut point.

Thus for each continuous-valued attribute, N-1 evaluations will take place for each evaluation of a candidate cut point T, then the data is partitioned into two sets.

Then class entropy of the resulting partition is computed.

Example

8

9

7

3

2

5

1

6

4

10

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

Example

Algorithm

Binary Discretization Let ‘T’ partition the set ‘S’ of examples into the

subsets ‘S1’ and ‘S2’. Let there be ‘K’ classes ‘C1…..,Ck’ and let P(Ci,S)

be the proportion of examples in ‘S’ that have class Ci the class entropy of a subset S is defined as:

Algorithm

Class Entropy

Algorithm

Binary Discretization When the logarithm base is 2, Ent(S) measures

the amount of information needed in bits. To specify the classes in S.

To evaluate the resulting class entropy after a set S is partitioned into two sets S1 and S2

Algorithm

Example For an example set S an attribute A and a

cut point value T. Let S1 subset S be the subset of examples in S

with A values <=T and S2=S-S1. The class information entropy of the partition

induced by T. E(A, T, S) is defined as.

Algorithm

Example

Algorithm

Binary Discretization A binary discretization for A is determined

by selecting the cut point TA for which E(A, TA, S) is minimum amongst all the candidate cut points.

Gain of the entropy

Once we find out the minimum amongst all the candidate cut points, then compute the gain in the entropy.

How to compute the gain of entropy?

Gain of the entropy

Gain of the entropy

MDLPC Criterion

The Minimum Description Length Principle: Once we find the gain of the entropy now

we are ready to state our decision criterion for accepting or rejecting a given partition based on the MDLP.

MDCLPC Criteria

The partition induced by a cut point T for a set S of N examples is accepted then discretization process will through and we provide the discrete value to the each and every class from that dataset.

The partition induced by a cut point T for a set S of N examples is rejected then cut point which we selected is wrong find the cut points again from the given example dataset.

Empirical Evaluation

We compare four different decision strategies for deciding whether or not to accept a partition. Following criteria we follow for variations of algorithm

Never Cut: The original binary interval algorithm

Always Cut: Always accept a cut unless all examples have the same class or the same value for the attribute.

Random cut: Accepts or rejects by flipping the fair coin.

MDLP cut: The derived MDLPC criterion.

Results

Thank you