Presentation on Decision trees Presented to: Sir Marooof Pasha.

Presentation Presentation onon

Decision treesDecision treesPresented to:Presented to:

Sir Marooof PashaSir Marooof Pasha

Group membersGroup membersKiran Shakoor 07-05Kiran Shakoor 07-05Nazish Yaqoob 07-11Nazish Yaqoob 07-11Razeena Ameen 07-25Razeena Ameen 07-25Ayesha Yaseen 07-31Ayesha Yaseen 07-31Nudrat Rehman 07-47Nudrat Rehman 07-47

Decision TreesDecision TreesRules for classifying data using attributes.Rules for classifying data using attributes.The tree consists of decision nodes and The tree consists of decision nodes and

leaf nodes.leaf nodes.A decision node has two or more A decision node has two or more

branches, each representing values for the branches, each representing values for the attribute tested.attribute tested.

A leaf node attribute which does not A leaf node attribute which does not require additional classification testing.require additional classification testing.

The Process of Constructing a The Process of Constructing a Decision TreeDecision Tree

Select an attribute to place at the Select an attribute to place at the root of the decision tree and make root of the decision tree and make one branch for every possible value.one branch for every possible value.

Repeat the process recursively for Repeat the process recursively for each branch.each branch.

Decision Trees Example

Minivan

Age

Car Type

YES NO

YES

<30 >=30

Sports, Truck

0 30 60 Age

YESYES

NO

Minivan

Sports,Truck

ID codeID code OutlookOutlook TemperaturTemperaturee

HumidityHumidity WindyWindy PlayPlay

aabbccddeeffgghhiijjkkllmmnn

SunnySunnySunnySunnyOvercastOvercastRainyRainyRainyRainyRainyRainyOvercastOvercastSunnySunnySunnySunnyRainyRainySunnySunnyOvercastOvercastOvercastOvercastRainyRainy

HotHotHotHotHotHotMildMildCoolCoolCoolCoolCoolCoolMildMildCoolCoolMildMildMildMildMildMildHotHotMildMild

HighHighHighHighHighHighHighHighNormalNormalNormalNormalNormalNormalHighHighNormalNormalNormalNormalNormalNormalHighHighNormalNormalHighHigh

FalseFalseTrueTrueFalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseTrueTrue

NoNoNoNoYesYesYesYesYesYesNoNoYesYesNoNoYesYesYesYesYesYesYesYesYesYesNoNo

Decision tree ExampleDecision tree Example

Advantages of decision treesAdvantages of decision treesSimple to understand and interpret Simple to understand and interpret Able to handle both numerical Able to handle both numerical

and categorical data and categorical data Possible to validate a model using Possible to validate a model using

statistical tests statistical tests

Nudrat Rehman 07-47Nudrat Rehman 07-47

What is ID3?What is ID3?

A mathematical algorithm for building the A mathematical algorithm for building the decision tree.decision tree.

Invented by J. Ross Quinlan in 1979.Invented by J. Ross Quinlan in 1979.Uses Information Theory invented by Uses Information Theory invented by

Shannon in 1948.Shannon in 1948.Builds the tree from the top down, with no Builds the tree from the top down, with no

backtracking.backtracking. Information Gain is used to select the most Information Gain is used to select the most

useful attribute for classification.useful attribute for classification.

EntropyEntropy

A formula to calculate the homogeneity of a A formula to calculate the homogeneity of a sample.sample.

A completely homogeneous sample has A completely homogeneous sample has entropy of 0.entropy of 0.

An equally divided sample has entropy of 1.An equally divided sample has entropy of 1.Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a

sample of negative and positive elements.sample of negative and positive elements.

The formula for entropy: The formula for entropy:

Entropy ExampleEntropy ExampleEntropy's) = Entropy's) =

- (9/14) Log2 (9/14) - (5/14) Log2 (5/14) - (9/14) Log2 (9/14) - (5/14) Log2 (5/14) = 0.940= 0.940

Information Gain (IG)Information Gain (IG) The information gain is based on the decrease in entropy after a The information gain is based on the decrease in entropy after a

dataset is split on an attribute.dataset is split on an attribute. Which attribute creates the most homogeneous branches?Which attribute creates the most homogeneous branches? First the entropy of the total dataset is calculated.First the entropy of the total dataset is calculated. The dataset is then split on the different attributes.The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added The entropy for each branch is calculated. Then it is added

proportionally, to get total entropy for the split. proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split.The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.The result is the Information Gain, or decrease in entropy. The attribute that yields the largest IG is chosen for the decision The attribute that yields the largest IG is chosen for the decision

node.node.

Information Gain (cont’d)Information Gain (cont’d)A branch set with entropy of 0 is a leaf A branch set with entropy of 0 is a leaf

node.node.Otherwise, the branch needs further Otherwise, the branch needs further

splitting to classify its dataset.splitting to classify its dataset.The ID3 algorithm is run recursively on the The ID3 algorithm is run recursively on the

non-leaf branches, until all data is non-leaf branches, until all data is classified.classified.

Nazish yaqoobNazish yaqoob07-1107-11

Example: The SimpsonsExample: The Simpsons

PersonPerson Hair Hair LengthLength

WeightWeight AgeAge ClassClass

HomerHomer 0”0” 250250 3636 MMMargeMarge 10”10” 150150 3434 FF

BartBart 2”2” 9090 1010 MMLisaLisa 6”6” 7878 88 FF

MaggieMaggie 4”4” 2020 11 FFAbeAbe 1”1” 170170 7070 MM

SelmaSelma 8”8” 160160 4141 FFOttoOtto 10”10” 180180 3838 MM

KrustyKrusty 6”6” 200200 4545 MM

ComicComic 8”8” 290290 3838 ??

Hair Length <= 5?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4)

= 0.8113


= 0.9710

np

nnp

nnp

pnp

pSEntropy 22 loglog)(

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

)()()( setschildallEsetCurrentEAGain

Let us try splitting on Hair length

Weight <= 160?yes no



= 0.7219


= 0

np

nnp

nnp

pnp


Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900


Let us try splitting on Weight

age <= 40?yes no



= 1


= 0.9183

np

nnp

nnp

pnp


Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183


Let us try splitting on Age

Weight <= 160?yes no

Hair Length <= 2?yes no

Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse!

This time we find that we can split on Hair length, and we are done!

Weight <= 160?

yes no

Hair Length <= 2?

yes no

We need don’t need to keep the data around, just the test conditions.

Male

Male Female

How would these people be classified?

It is trivial to convert Decision Trees to rules… Weight <= 160?

yes no

Hair Length <= 2?

yes no

Male

Male Female

Rules to Classify Males/Females

If Weight greater than 160, classify as MaleElseif Hair Length less than or equal to 2, classify as MaleElse classify as Female

Presentation on Decision trees Presented to: Sir Marooof Pasha.

Documents

Transcript of Presentation on Decision trees Presented to: Sir Marooof Pasha.