Algorithm for Constructing Decision Tree

11
Lect 17/ 01-09-09 1 Algorithm for constructing Decision tree

description

pdf 5

Transcript of Algorithm for Constructing Decision Tree

Lect 17/ 01-09-09 1

Algorithm for constructing Decision tree

Lect 17/ 01-09-09 2

• TreeGrowth (TR,F)1: if stopping_cond(TR,F)=true then

2: leaf=createNode()3: leaf.label=Classify(TR)4: return leaf5: else6: root=createNode()

7: root.test_cond=find_best_split(TR,F)8: let V={v|v is a possible outcome of root.test_cond}

9: for each v V do

10: TRv={tr|root.test_cond(tr)=v and tr TR}

11: child= TreeGrowth(TRv,F)12: add child as descendent of root and label the edge (root child)

as v13: end for14: endif15: return root.

Lect 17/ 01-09-09 3

Characteristics of Decision Tree Induction

Lect 17/ 01-09-09 4

• 1. DTI is a nonparametric approach for building class. model.– Doesn’t require any prior assumption about

the distribution of the data set

• 2. The algorithm presented so far uses a top-down, recursive partitioning strategy to induce a reasonable solution.

• 3. Tech. developed for construction of DT are computationally inexpensive, making it possible to construct models very fast even when datasets are very large.

Lect 17/ 01-09-09 5

• 4. DT especially small in size are relatively easy to interpret.

• 5. DT are robust to the presence of noise.

• 6. The presence of redundant attr. does not adversely affect the accuracy of the decision tree.

• An attr. is redundant if it is strongly correlated with another attr. in the data.

• If data contains irrelevant attr. then feature selection tech. can help to improve the accuracy of the DT by eliminating such attr. during preprocessing.

Lect 17/ 01-09-09 6

• 7. Data Fragmentation: Number of instances gets smaller as you traverse down the tree

• Number of instances at the leaf nodes could be too small to make any statistically significant decision.

• One possible solution is to disallow further splitting when the no. of records fall below a certain threshold.

Lect 17/ 01-09-09 7

y < 0.33?

: 0 : 3

: 4 : 0

y < 0.47?

: 4 : 0

: 0 : 4

x < 0.43?

Yes

Yes

No

No Yes No

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

• The test cond. considered so far contains only a single attr. at a time.

•So tree constr. can be viewed as partitioning the attr. space into disjoint regions until each region contains records of the same class label.

•Border line between two neighboring regions of different classes is known as decision boundary

• Decision boundary is parallel to axes because test condition involves a single attribute at-a-time. This limits the expressiveness of the DT representation for modeling complex rel. among cont. attr.

9. Decision Boundary

Lect 17/ 01-09-09 8

x + y < 1

Class = + Class =

• Test condition may involve multiple attributes in an oblique tree.

• More expressive representation

• Finding optimal test condition is computationally expensive

10. Oblique Decision Trees

The fig shown here can’t be classified effectively by a DT that uses single attr. test cond. at a time.

yes no

Lect 17/ 01-09-09 9

P

Q R

S 0 1

0 1

Q

S 0

0 1

• Same subtree appears in multiple branches.

• DT becomes difficult to interpret.

• Such situation arises when construction of DT depends on a single attr. test cond. at each internal node.

11. Tree Replication

Lect 17/ 01-09-09 10

• 8. Expressiveness :Decision tree provides expressive representation for learning discrete-valued function– But they do not generalize well to certain types of

Boolean functions• Example: parity function:

– Class = 1 if there is an even number of Boolean attributes with truth value = True

– Class = 0 if there is an odd number of Boolean attributes with truth value = True

• For accurate modeling, must have a complete tree with 2d nodes, where d is the no. of Boolean attr.

• (Q: Draw a DT for a parity function with 4 boolean attr. A, B, C, D)

• Not expressive enough for modeling continuous variables– Particularly when test condition involves only a single

attribute at-a-time

Lect 17/ 01-09-09 11

• 12. Studies have shown that choice of impurity measures do not impact the performance of the DT induction algorithms.