Data Mining-Knowledge Presentation—ID3 algorithm
description
Transcript of Data Mining-Knowledge Presentation—ID3 algorithm
![Page 1: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/1.jpg)
Data Mining-Knowledge Presentation—ID3 algorithm
Prof. Sin-Min LeeDepartment of Computer Science
![Page 2: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/2.jpg)
![Page 3: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/3.jpg)
![Page 4: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/4.jpg)
![Page 5: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/5.jpg)
![Page 6: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/6.jpg)
![Page 7: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/7.jpg)
![Page 8: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/8.jpg)
![Page 9: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/9.jpg)
![Page 10: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/10.jpg)
![Page 11: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/11.jpg)
Data Mining Tasks
Predicting onto new data by using rules or patterns and behaviors – Classification– Estimation
Understanding the groupings, trends, and characteristics of your customer– Segmentation
Visualizing the Euclidean spatial relationships, trends, and patterns of your data– Description
![Page 12: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/12.jpg)
Stages of Data Mining Process
1. Data gathering, e.g., data warehousing.2. Data cleansing: eliminate errors and/or bogus
data, e.g., patient fever = 125.3. Feature extraction: obtaining only the
interesting attributes of the data, e.g., “date acquired” is probably not useful for clustering celestial objects, as in Skycat.
4. Pattern extraction and discovery. This is the stage that is often thought of as “data mining” and is where we shall concentrate our effort.
5. Visualization of the data.6. Evaluation of results; not every discovered
fact is useful, or even true! Judgment is necessary before following your software's conclusions.
![Page 13: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/13.jpg)
![Page 14: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/14.jpg)
Clusters of GalaxiesSkycat clustered 2x109
sky objects into stars, galaxies, quasars, etc. Each object was a point in a space of 7 dimensions, with each dimension representing radiation in one band of the spectrum. The Sloan Sky Survey is a more ambitious attempt to catalog and cluster the entire visible universe
![Page 15: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/15.jpg)
Cholera outbreak in London
Clustering: Examples
![Page 16: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/16.jpg)
![Page 17: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/17.jpg)
Decision trees are an alternative way of structuring rule information.
![Page 18: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/18.jpg)
humidity windy
outlook
sunny rainovercast
normal true false
N NP P
P
![Page 19: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/19.jpg)
A Classification rule based on the tree
if outlook = overcast
outlook =
sunny &humidity =
normaloutlook = rain
&windy = false
then P
if outlook = overcast
then P
if outlook = sunny &
humidity = normal
then P
if outlook = rain &
windy = falsethen P
![Page 20: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/20.jpg)
Outlook
Sunny Overcast Rain
Humidity
High Normal
No Yes
Each internal node tests an attribute
Each branch corresponds to anattribute value node
Each leaf node assigns a classification
![Page 21: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/21.jpg)
![Page 22: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/22.jpg)
![Page 23: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/23.jpg)
![Page 24: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/24.jpg)
Top-Down Induction of Decision Trees ID3
1. A the “best” decision attribute for next node
2. Assign A as decision attribute for node3. For each value of A create new
descendant 4. Sort training examples to leaf node
according to the attribute value of the branch5. If all training examples are perfectly
classified (same value of target attribute) stop, else iterate over new leaf nodes.
![Page 25: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/25.jpg)
![Page 26: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/26.jpg)
Which Attribute is ”best”?
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-] A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
![Page 27: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/27.jpg)
Entropy
S is a sample of training examplesp+ is the proportion of positive examplesp- is the proportion of negative examplesEntropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
![Page 28: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/28.jpg)
Entropy(S)= expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code)Why?Information theory optimal length code assign –log2 p bits to messages having probability p. So the expected number of bits to encode (+ or -) of random member of S:
-p+ log2 p+ - p- log2 p-
Entropy
![Page 29: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/29.jpg)
• Gain(S,A): expected reduction in entropy due to sorting S on attribute A
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-] A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99
Information Gain
![Page 30: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/30.jpg)
Information Gain
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-]
Entropy([21+,5-]) = 0.71Entropy([8+,30-]) = 0.74Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27
Entropy([18+,33-]) = 0.94Entropy([8+,30-]) = 0.62Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12
A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
![Page 31: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/31.jpg)
Day Outlook Temp. Humidity Wind Play TennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Weak YesD8 Sunny Mild High Weak NoD9 Sunny Cold Normal Weak YesD10 Rain Mild Normal Strong YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong No
Training Examples
![Page 32: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/32.jpg)
Selecting the Next Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]E=0.940
Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]E=0.940
Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048
![Page 33: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/33.jpg)
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]E=0.940
Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971=0.247
E=0.971E=0.971
Overcast
[4+, 0]E=0.0
Selecting the Next Attribute
![Page 34: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/34.jpg)
Outlook
Sunny Overcast Rain
Yes
[D1,D2,…,D14] [9+,5-]
Ssunny=[D1,D2,D8,D9,D11] [2+,3-]
? ?
[D3,D7,D12,D13] [4+,0-]
[D4,D5,D6,D10,D14] [3+,2-]
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
ID3 Algorithm
![Page 35: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/35.jpg)
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
![Page 36: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/36.jpg)
The ID3 AlgorithmGiven• a set of disjoint target classes {C1, C2, …, Ck},• a set of training data, S, containing objects of more than
one class.
Let T be any test on a single attribute of the data, with O1, O2, …, On
representing the possible outcomes of applying T to any object x (written as T(x)).
T produces a partition {S1, S2, …, Sn} of S such that
Si = { x | T(x) = Oi}
![Page 37: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/37.jpg)
Proceed recursively to replace each Si with a decision tree.
Crucial factor: Selecting the tests.
S1 Sn
S
O1 OnO2
S2
…
…
![Page 38: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/38.jpg)
In making this decision, Quinlan employs the notion of uncertainty (entropy from information theory).
M = {m1, m2, …, mn} Set of messagesp(mi) Probability of the message mi being receivedI(mi) = -log p(mi) Amount of information of message mi
U(M) = i p(mi) I(mi) Uncertainty of the set M
Quinlan’s assumptions:•A correct decision tree for S will classify objects in the same proportion as their representation in S.•Given a case to classify, a test can be regarded as the source of a message about that case.
![Page 39: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/39.jpg)
Let Ni be the number of cases in S that belong to a class Ci:p(cCi) = Ni / |S|
The uncertainty, U(S), measures the average amount of information needed to determine the class of a random case, cS.
Uncertainty measure after S has been partitioned.UT(S) = i (|Si| / |S|) U(Si)
Select the test T that gains the most information, i.e., whereGS(T) = U(S) – UT(S)
is maximal.
![Page 40: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/40.jpg)
Evaluation of ID3The ID3 algorithm tends to favor tests with a large number of outcomesover tests with a smaller number.Its computational complexity depends on the cost of choosing the next test to branch on;It was adapted to deal with noisy and incomplete data;It is a feasible alternative to knowledge elicitation if sufficient data of the right kind are available;However this method is not incremental.
Further modification were introduced in C4.5, e.g :•pruning the decision tree in order to avoid overfitting•Better test selection heuristic
![Page 41: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/41.jpg)
![Page 42: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/42.jpg)
Search Space and Search Trees
• Search space is logical space composed of – nodes are search states– links are all legal connections between search
states• e.g. in chess, no link between states where W castles
having previously moved K.– always just an abstraction– think of search algorithms trying to navigate
this extremely complex space
![Page 43: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/43.jpg)
Search Trees• Search trees do not summarise all possible searches– instead an abstraction of one
possible search• Root is null state• edges represent one choice
– e.g. to set value of A first• child nodes represent extensions
– children give all possible choices• leaf nodes are solutions/failures• Example in SAT
– algorithm detects failure early– need not pick same variables
everywhere
B(a B )
Im p oss ib le
b(a b )
Im p oss ib le
a(a )
C h oose B
B(A B C )
Im p oss ib le
b(A b C )
S o lu t ion
C(A C )
C h oose B
cA c
im p oss ib le
A(A )
C h oose C
A B C , A B c , A b C , A b c , aB C , ab C , ab cs ta te = ()C h oose A
![Page 44: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/44.jpg)
![Page 45: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/45.jpg)
![Page 46: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/46.jpg)
![Page 47: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/47.jpg)
![Page 48: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/48.jpg)
![Page 49: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/49.jpg)
![Page 50: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/50.jpg)
Definition• A tree shaped structure that
represents a set of decisions. These decisions are used as a basis for predictions.
• They represent rules for classifying datasets. Useful knowledge can be extracted by this classification.
![Page 51: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/51.jpg)
DT Structure• Node Types
– Decision nodes: specifies some test to be carried out on a single attribute value.• Each outcome is assigned to a branch that
connects to a leaf node or another decision node.
– Leaf nodes: indicates the classification of an example.
![Page 52: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/52.jpg)
An Example
![Page 53: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/53.jpg)
Growing a Decision Tree
![Page 54: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/54.jpg)
Iterative Dichotomiser 3 (ID3) Algorithm
• Invented by J. Ross Quinlan in 1975• Based on a greedy search algorithm.
![Page 55: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/55.jpg)
ID3 Cont.• The goal is to create the best possible
tree that works on all available data.– An example is strictly either belonging to
one class or the other.– Need to select the attribute that best
classify the example (i.e. need to select the attribute with smallest entropy on all the example.)
– The lower the entropy the higher the Information Gain. We desire high IG.
![Page 56: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/56.jpg)
Entropy• A quantitative measurement of the
homogeniety of a set of examples.• It tells us how well an attribute
separate the training examples according to their target classification.
![Page 57: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/57.jpg)
Entropy cont.• Given a set S with only positive or
negative examples (2 class case):
Entropy(S) = -PPlog2PP – Pnlog2Pn
Where PP = proportion of positive examples PN = proportion of negative examples
![Page 58: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/58.jpg)
Entropy cont.• Ex. Given 25 examples with 15 positive
and 10 negative.Entropy(S) = -(15/25)log2(15/25)
-(10/25)log2(10/25) = .97
If Entropy(S) = 0 all members in S belong to strictly one class.
If Entropy(S) = 1 (max value) members are split equally between the two classes.
![Page 59: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/59.jpg)
In General…• In general, if an attribute takes more
than two values
entropy(S) =
Where n is the number of values
n
iii pp
1
)log(
![Page 60: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/60.jpg)
Information Gain
Where A is an attribute of S Value(A) is the set of possible value of A
v is a particular value in Value(A) Sv is a subset of S having of v’s on value(A).
![Page 61: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/61.jpg)
Actual AlgorithmID3 (Examples, Target_Attribute, Attributes)Create a root node for the tree If all examples are positive, Return the single-node tree Root, with label = +. If all examples are negative, Return the single-node tree Root, with label = -. If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples. Otherwise Begin A = The Attribute that best classifies examples. Decision Tree attribute for Root = A. For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi. Let Examples(vi), be the subset of examples that have the value vi for A If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
End Return Root
![Page 62: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/62.jpg)
![Page 63: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/63.jpg)
Independent Attributes Dependent
Outlook Temperature Humidity Windy Play
sunny 85 85 FALSE Don’t play
sunny 80 90 TRUE Don’t play
overcast 83 78 FALSE Play
rain 70 95 FALSE Play
rain 68 80 FALSE Play
rain 65 70 TRUE Don’t play
overcast 64 65 TRUE Play
sunny 72 95 FALSE Don’t play
sunny 69 70 FALSE Play
rain 75 80 FALSE Play
sunny 75 70 TRUE Play
overcast 72 90 TRUE Play
overcast 81 75 FALSE Play
rain 71 80 TRUE Don’t play
We choose Play as our dependent attribute.Don’t play = NegativePlay = Positive
![Page 64: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/64.jpg)
Independent Attributes Dependent
Outlook Temperature Humidity Windy Play
sunny 85 85 FALSE Don’t play
sunny 80 90 TRUE Don’t play
overcast 83 78 FALSE Play
rain 70 95 FALSE Play
rain 68 80 FALSE Play
rain 65 70 TRUE Don’t play
overcast 64 65 TRUE Play
sunny 72 95 FALSE Don’t play
sunny 69 70 FALSE Play
rain 75 80 FALSE Play
sunny 75 70 TRUE Play
overcast 72 90 TRUE Play
overcast 81 75 FALSE Play
rain 71 80 TRUE Don’t play
![Page 65: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/65.jpg)
Outlook playdon’t play total Entropy
sunny 2 3 5 0.34676807
ovecast 4 0 4 0
rain 3 2 5 0.34676807
total 0.69353614
Temp
>70 5 4 9 0.63712032
<=70 4 1 5 0.25783146
total 0.89495179
Humidity
>70 6 4 10 0.69353614
<=70 3 1 4 0.23179375
total 0.92532989
Windy
TRUE 3 3 6 0.42857143
FALSE 6 2 8 0.4635875
total 0.89215893
![Page 66: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/66.jpg)
![Page 67: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/67.jpg)
Independent Attributes Dependent
Outlook Temperature Humidity Windy Play
sunny 85 85 FALSE Don’t play
sunny 80 90 TRUE Don’t play
sunny 72 95 FALSE Don’t play
sunny 69 70 FALSE Play
sunny 75 70 TRUE Play
![Page 68: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/68.jpg)
play don’t play total Entropy
Temp
>70 1 3 4 0.6490225
<=70 1 0 1 0
total 0.6490225
Humidity
>70 0 3 3 0
<=70 2 0 2 0
total 0
Windy
TRUE 1 1 2 0.4
FALSE 1 2 3 0.5509775
total 0.9509775
![Page 69: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/69.jpg)
![Page 70: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/70.jpg)
Independent Attributes Dependent
Outlook Temperature Humidity Windy Play
rain 70 95 FALSE Play
rain 68 80 FALSE Play
rain 65 70 TRUE Don’t play
rain 75 80 FALSE Play
rain 71 80 TRUE Don’t play
![Page 71: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/71.jpg)
play don’t play total Entropy
Temp
>70 1 1 2 0.4
<=70 2 1 3 0.5509775
total 0.9509775
Humidity
>70 1 0 3 0
<=70 3 1 4 0.6490225
total 0.6490225
Windy
TRUE 0 2 2 0
FALSE 3 0 3 0
total 0
![Page 72: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/72.jpg)
![Page 73: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/73.jpg)
Independent Attributes Dependent
Outlook Temperature Humidity Windy Play
overcast 83 78 FALSE Play
overcast 64 65 TRUE Play
overcast 72 90 TRUE Play
overcast 81 75 FALSE Play
![Page 74: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/74.jpg)
play don’t play total Entropy
Temp
>70 2 0 2 0
<=70 0 2 2 0
total 0
Humidity
>70 3 0 3 0
<=70 1 0 1 0
total 0
Windy
TRUE 2 0 2 0
FALSE 2 0 2 0
total 0
![Page 75: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/75.jpg)
![Page 76: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/76.jpg)
ID3 Summary• Step 1: Take all unused attribute
and count their entropy concerning test samples.
• Step 2: Choose the attribute with smallest entropy.
• Step 3: Make a node containing that attribute.
![Page 77: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/77.jpg)
Growth stops when:• Every attribute already exist along
the path thru the tree.• The training examples associated
with a leaf all have the same target attribute.
![Page 78: Data Mining-Knowledge Presentation—ID3 algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062222/56815c9b550346895dcaa5c2/html5/thumbnails/78.jpg)
References• An Overview of Data Mining Techniques
– http://www.thearling.com/text/dmtechniques/dmtechniques.htm
• Decision tree– http://en.wikipedia.org/wiki/Decision_tree
• Decision Trees – http://dms.irb.hr/tutorial/tut_dtrees.php