DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example ...
-
Upload
melvin-glenn -
Category
Documents
-
view
216 -
download
0
description
Transcript of DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example ...
![Page 1: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/1.jpg)
DECISION TREESAsher Moody, CS 157B
![Page 2: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/2.jpg)
Overview Definition Motivation Algorithms
ID3 Example Entropy Information Gain Applications Conclusion
![Page 3: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/3.jpg)
Decision Tree Decision trees are a fundamental
technique used in data mining.
Decision trees are used for classification, clustering, feature selection, and prediction.
![Page 4: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/4.jpg)
Motivation Decision trees help accurate classify
data
Decision trees help understand the predictive nature of the data by recognizing patterns
Decision trees depict the relationships between input data and target outputs
![Page 5: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/5.jpg)
Algorithms Decision trees algorithms are greedy so
once test has been selected to partition the data other options will not be explored
Popular Algorithms Computer Science: ID3, C4.5, and C5.0 Statistics: Classification and Regression
Trees (CART)
![Page 6: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/6.jpg)
ID3 Algorithm Given: Examples(S); Target attribute (C);
Attributes (R) Initialize Root Function ID3 (S,C,R) Create a Root node for the tree IF S = empty, return a single node with value Failure; IF S = C, return a single node C; IF R = empty, return a single node with most frequent target
attribute (C); ELSE BEGIN… (next slide)
![Page 7: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/7.jpg)
ID3 (cont) BEGIN Let D be the attribute with largest Gain Radio (D, S) among
attributes in R; Let {dj | j = 1, 2, …, n} be the values of attribute D; Let {Sj | j = 1, 2, …, n} be the subsets of S consisting respectively of
records with value dj for attribute D; Return a tree with root labeled D arcs d1, d2, …, dn going
respectively to the trees; For each branch in the tree IF S = empty, add a new branch with most frequent C; ELSE ID3 (S1, C, R – {D}), ID3 (S2, C, R – {D}), …, IDC(Sn, C, R – {D}) END ID3 Return Root
![Page 8: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/8.jpg)
Example 1
![Page 9: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/9.jpg)
Example 2
![Page 10: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/10.jpg)
Entropy Entropy gives us a measure of how uncertain we are
about the data Maximum: The measure should be maximal if all the
outcomes are equally likely (uncertainty is highest when all possible events are equiprobable).
where Pi is the proportion of instances in the dataset that take the ith value of the target attribute
![Page 11: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/11.jpg)
Information Gain Gain calculates the reduction in entropy (gain in
information) that would result from splitting the data at a particular attribute A.
where v is a value of A, |Sv| is the subset of instances of S where A takes the value v, and |S| is the number of instances
![Page 12: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/12.jpg)
Applications Business: to track purchasing patterns Medical: identify potential risks
associated with diseases Banks: identify potential credit risks Governments: to determine features of
potential terrorists Seismology: to predict earthquakes
![Page 13: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/13.jpg)
Conclusion Search through attributes to find the
proportions Calculate the entropy for each possible
data input for a particular attribute Calculate the gain for each attribute Make the attribute with the highest gain
the root node Continue the process until decision tree
is complete
![Page 14: DECISION TREES Asher Moody, CS 157B. Overview Definition Motivation Algorithms ID3 Example Entropy Information Gain Applications Conclusion.](https://reader035.fdocuments.us/reader035/viewer/2022062600/5a4d1b977f8b9ab0599c43ea/html5/thumbnails/14.jpg)
References Berry, M. W. (2006). Lecture Notes in
Data Mining. World Scientific http://www.decisiontrees.net http://en.wikipedia.org/wiki/Entropy http://en.wikipedia.org/wiki/
Information_gain_in_decision_trees