Chapter II.6 (Book Part VI) Learning
description
Transcript of Chapter II.6 (Book Part VI) Learning
![Page 1: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/1.jpg)
Machine LearningMachine Learning
Erica MelisErica Melis
![Page 2: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/2.jpg)
Machine Learning
Introduction to Machine Learning
Decision Trees
Overfitting
A Little Introduction OnlyA Little Introduction Only
Artificial Neuronal Nets
![Page 3: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/3.jpg)
Machine Learning
Why Machine Learning (1)Why Machine Learning (1)
• Growing flood of online data
• Budding industry
• Computational power is available
• progress in algorithms and theory
![Page 4: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/4.jpg)
Machine Learning
Why Machine Learning (2)Why Machine Learning (2)
• Data mining: using historical data to improve decision– medical records ⇒ medical knowledge
– log data to model user
• Software applications we can’t program by hand– autonomous driving
– speech recognition
• Self customizing programs– Newsreader that learns user interests
![Page 5: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/5.jpg)
Machine Learning
Some success storiesSome success stories
• Data Mining, Lernen im Web• Analysis of astronomical data• Human Speech Recognition• Handwriting recognition• Fraudulent Use of Credit Cards• Drive Autonomous Vehicles• Predict Stock Rates• Intelligent Elevator Control• World champion Backgammon• Robot Soccer• DNA Classification
![Page 6: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/6.jpg)
Machine Learning
Problems Too Difficult to Program by HandProblems Too Difficult to Program by Hand
ALVINN drives 70 mph on highways
![Page 7: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/7.jpg)
Machine Learning
Credit Risk AnalysisCredit Risk Analysis
If Other-Delinquent-Accounts > 2, and Number-Delinquent-Billing-Cycles > 1Then Profitable-Customer? = No [Deny Credit Card application]
If Other-Delinquent-Accounts = 0, and (Income > $30k) OR (Years-of-Credit > 3)Then Profitable-Customer? = Yes [Accept Credit Card application]
Machine Learning, T. Mitchell, McGraw Hill, 1997
![Page 8: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/8.jpg)
Machine Learning
Typical Data Mining TaskTypical Data Mining Task
• 9714 patient records, each describing a pregnancy and birth• Each patient record contains 215 features
• Classes of future patients at high risk for Emergency Cesarean Section
Learn to predict:
Given:
Machine Learning, T. Mitchell, McGraw Hill, 1997
![Page 9: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/9.jpg)
Machine Learning
Datamining ResultDatamining Result
IF No previous vaginal delivery, and Abnormal 2nd Trimester Ultrasound, and Malpresentation at admissionTHEN Probability of Emergency C-Section is 0.6
Over training data: 26/41 = .63, Over test data: 12/20 = .60
Machine Learning, T. Mitchell, McGraw Hill, 1997
![Page 10: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/10.jpg)
Machine Learning
![Page 11: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/11.jpg)
Machine Learning
How does an Agent learn?How does an Agent learn?
Priorknowledge
HypothesesKnowledge-basedinductive learning
Observations Predictions
EH
B
![Page 12: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/12.jpg)
Machine Learning
Machine Learning TechniquesMachine Learning Techniques
• Decision tree learning• Artificial neural networks• Naive Bayes• Bayesian Net structures• Instance-based learning• Reinforcement learning• Genetic algorithms• Support vector machines• Explanation Based Learning• Inductive logic programming
![Page 13: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/13.jpg)
Machine Learning
What is the Learning Problem?What is the Learning Problem?
• Improve over Task T• with respect to performance measure P• based on experience E
Learning = Improving with experience at some task
![Page 14: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/14.jpg)
Machine Learning
The Game of CheckersThe Game of Checkers
![Page 15: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/15.jpg)
Machine Learning
Learning to Play CheckersLearning to Play Checkers
• T: Play checkers• P: Percent of games won in world tournament..• E: games played against self..
• What exactly should be learned?• How shall it be represented?• What specific algorithm to learn it?
![Page 16: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/16.jpg)
Machine Learning
A Representation for Learned Function A Representation for Learned Function V’(b)V’(b)
• x1: number of black pieces on board b
• x2 :number of red pieces on board b
• x3 :number of black kings on board b
• x4 number of red kings on board b
• x5 number of read pieces threatened by black (i.e., which can be taken on black’s next turn)
• x6 number of black pieces threatened by red
V’(b)= w0 + w1* x1 + w2* x2+ w3* x3+ w4*x4 + w5* x5+ w6*x6
Target function: V: Board IR
Target function representation:
![Page 17: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/17.jpg)
Machine Learning
Function Approximation Algorithm*Function Approximation Algorithm*
• V(b): the true target function• V’(b): the learned function
• Vtrain(b): the training value
• (b, Vtrain(b)) training example
• Vtrain(b) ← V’(Successor(b)) for intermediate b
One rule for estimating training values:
![Page 18: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/18.jpg)
Machine Learning
Contd: Choose Weight Tuning Rule*Contd: Choose Weight Tuning Rule*
• Select a training example b at random
1. Compute error(b) with current weightserror(b) = Vtrain(b) – V’(b)
2. For each board feature xi, update weight wi:wi ← wi + c * xi * error(b)
c is small constant to moderate the rate of learning
Do repeatedly:
LMS Weight update rule:
![Page 19: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/19.jpg)
Machine Learning
...A.L. Samuel
![Page 20: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/20.jpg)
Machine Learning
Design Choices for Checker LearningDesign Choices for Checker Learning
![Page 21: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/21.jpg)
Machine Learning
Introduction to Machine Learning
Inductive Learning
Decision TreesEnsemble LearningOverfitting
Artificial Neuronal Nets
OverviewOverview
![Page 22: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/22.jpg)
Machine Learning
Supervised Inductive Learning (1)Supervised Inductive Learning (1)
Why is learning difficult?
• inductive learning generalizes from specific examples cannot be proven true; it can only be proven false
• not easy to tell whether hypothesis h is a good approximation of a target function f
• complexity of hypothesis – fitting data
![Page 23: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/23.jpg)
Machine Learning
Supervised Inductive Learning (2)Supervised Inductive Learning (2)
To generalize beyond the specific examples, one needs constraints or biases on what h is best.
• the overall class of candidate hypotheses restricted hypothesis space bias
• a metric for comparing candidate hypotheses to determine whether one is better than another preference bias
For that purpose, one has to specify
![Page 24: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/24.jpg)
Machine Learning
Supervised Inductive Learning (3)Supervised Inductive Learning (3)
Having fixed the bias, learning can be considered as search in the hypothesis space which is guided by the used preference bias.
![Page 25: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/25.jpg)
Machine Learning
Decision Tree Learning Decision Tree Learning Quinlan86, Feigenbaum61Quinlan86, Feigenbaum61
temperature = hot & windy = true & humidity = normal & outlook = sunny PlayTennis = ?
Goal predicate: PlayTennisHypotheses space:Preference bias:
![Page 26: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/26.jpg)
Machine Learning
Illustrating Example Illustrating Example (RusselNorvig)(RusselNorvig)
The problem: wait for a table in a restaurant?
![Page 27: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/27.jpg)
Machine Learning
Illustrating Example: Training DataIllustrating Example: Training Data
![Page 28: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/28.jpg)
Machine Learning
A Decision Tree for A Decision Tree for WillWait (WillWait (SR)SR)
![Page 29: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/29.jpg)
Machine Learning
Path in the Decision Tree
TAFEL
![Page 30: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/30.jpg)
Machine Learning
General ApproachGeneral Approach
• let A1, A2, ..., and An be discrete attributes, i.e. each attribute has finitely many values
• let B be another discrete attribute, the goal attribute
Learning goal:
learn a function f: A1 x A2 x ... x An B
Examples:
elements from A1 x A2 x ... x An x B
![Page 31: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/31.jpg)
Machine Learning
General ApproachGeneral Approach
Restricted hypothesis space bias:
the collection of all decision trees over the attributes A1, A2, ..., An, and B forms the set of possible candidate hypotheses
Preference bias:
prefer small trees consistent with the training examples
![Page 32: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/32.jpg)
Machine Learning
Decision Trees: definition Decision Trees: definition for recordfor record
A decision tree over the attributes A1, A2,.., An, and B is a tree in which
• each non-leaf node is labelled with one of the attributes A1, A2, ..., and An
• each leaf node is labelled with one of the possible values for the goal attribute B
• a non-leaf node with the label Ai has as many outgoing arcs as there are possible values for the attribute Ai; each arc is labelled with one of the possible values for Ai
![Page 33: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/33.jpg)
Machine Learning
Decision Trees: application of tree Decision Trees: application of tree for recordfor record
Let x be an element from A1 x A2 x ... x An and let T be a decision tree.
The element x is processed by the tree T starting at the root and following the appropriate arc until a leaf is reached. Moreover, x receives the value that is assigned to the leaf reached.
![Page 34: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/34.jpg)
Machine Learning
Expressiveness of Decision TreesExpressiveness of Decision Trees
Any boolean function can be written as a decision tree.
001
111
110
000
BA2A1
A1
A2 A2
0 1 0 1
1
11
0
0 0
![Page 35: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/35.jpg)
Machine Learning
Decision Trees Decision Trees
• fully expressive within the class of propositional languages
• in some cases, decision trees are not appropriate
sometimes exponentially large decision trees (e.g. parity function; returns 1 iff an even number of inputs are 1) replicated subtree problem e.g. when coding the following two rules in a tree:
„if A1 and A2 then B“„if A3 and A4 then B“
![Page 36: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/36.jpg)
Machine Learning
Decision Trees Decision Trees
Finding a smallest decision tree that is consistent with a set of examples presented is an NP-hard problem.
smallest „=“ minimal in the overall number of nodes
instead of constructing a smallest decision tree the focus is on the construction of a pretty small one
greedy algorithm
![Page 37: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/37.jpg)
Machine Learning
Inducing Decision Trees Algorithm Inducing Decision Trees Algorithm for recordfor record
function DECISION-TREE-LEARNING(examples, attribs, default) returns a decision treeinputs: examples, set of examples
attribs, set of attributesdefault, default value for the goal predicate
if examples is empty then return default else if all examples have the same classification
then return the classificationelse if attribs is empty then return
MAJORITY-VALUE(examples)else
best CHOOSE-ATTRIBUTE(attribs, examples)tree a new decision tree with root test bestm MAJORITY-VALUE(examplesi)
for each value vi of best doexamplesi {elements of examples with best = vi}subtree DECISION-TREE-LEARNING(examplesi,
attribs – best, m)add a branch to tree with label vi and subtree subtree
return tree
![Page 38: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/38.jpg)
Machine Learning
![Page 39: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/39.jpg)
Machine Learning
Training ExamplesTraining ExamplesDay Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
T. Mitchell, 1997
![Page 40: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/40.jpg)
Machine Learning
![Page 41: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/41.jpg)
Machine Learning
![Page 42: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/42.jpg)
Machine Learning
Entropy Entropy n = 2n = 2
• S is a sample of training examples• p+ is the proportion of positive examples in S• p- is the proportion of negative examples in S• Entropy measures the impurity of S
Entropy(S) ≡ -p+ log2p+ - p- log2 p-
•
![Page 43: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/43.jpg)
Machine Learning
![Page 44: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/44.jpg)
Machine Learning
![Page 45: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/45.jpg)
Machine Learning
![Page 46: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/46.jpg)
Machine Learning
![Page 47: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/47.jpg)
Machine Learning
Example Example WillWait WillWait (do it yourself)(do it yourself)
the problem of whether to wait for a table in a restaurant
![Page 48: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/48.jpg)
Machine Learning
WillWaitWillWait (do it yourself) (do it yourself)
Which attribute to choose?
![Page 49: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/49.jpg)
Machine Learning
Learned TreeLearned Tree WillWait WillWait
![Page 50: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/50.jpg)
Machine Learning
Assessing Decision TreesAssessing Decision Trees
Assessing the performance of a learning algorithm:a learning algorithm has done a good job, if its finalhypothesis predicts the value of the goal attribute
of unseen examples correctly
General strategy (cross-validation)1. collect a large set of examples 2. divide it into two disjoint sets: the training set and the test set3. apply the learning algorithm to the training set, generating a
hypothesis h4. measure the quality of h applied to the test set5. repeat steps 1 to 4 for different sizes of training sets and
different randomly selected training sets of each size
![Page 51: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/51.jpg)
Machine Learning
When is decision tree learning appropriate?When is decision tree learning appropriate?
• Instances represented by attribute-value pairs• Target function has discret values• Disjunctive descriptions may be required• Training data may contain missing or noisy data
![Page 52: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/52.jpg)
Machine Learning
Extensions and Problems Extensions and Problems
• dealing with continuous attributes - select thresholds defining intervals; as a result each
interval becomes a discrete value- dynamic programming methods to find appropriate
split points still expensive
• missing attributes- introduce a new value- use default values (e.g. the majority value)
• highly-branching attributes- e.g. Date has a different value for every example;
information gain measure: GainRatio = Gain/SplitInformation penalizes broad+uniform
![Page 53: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/53.jpg)
Machine Learning
Extensions and Problems Extensions and Problems
• noisee.g. two or more examples with the same description but
different classifications -> leaf nodes report the majority classification for its setOr report estimated probability (relative frequency)
• overfittingthe learning algorithm uses irrelevant attributes to find a
hypothesis consistent with all examples; pruning techniques; e.g. new non-leaf nodes will only be introduced if the information gain is larger than a particular threshold
![Page 54: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/54.jpg)
Machine Learning
Introduction to Machine Learning
Inductive Learning: Decision Trees Overfitting Artificial Neuronal Nets
OverviewOverview
![Page 55: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/55.jpg)
Machine Learning
Overfitting in Decision TreesOverfitting in Decision Trees
Consider adding training example #15:
Sunny, Hot, Normal, Strong, PlayTennis = No
What effect on earlier tree?
![Page 56: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/56.jpg)
Machine Learning
OverfittingOverfitting
Consider error of hypothesis h over
• training data: errortrain(h)• entire distribution D of data: errorD(h)
Hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that
anderrortrain(h) < errortrain(h’)
errorD(h) > errorD(h’)
![Page 57: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/57.jpg)
Machine Learning
Overfitting in Decision Tree LearningOverfitting in Decision Tree Learning
T. Mitchell, 1997
![Page 58: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/58.jpg)
Machine Learning
Avoiding OverfittingAvoiding Overfitting
• stop growing when data split not statistically significant• grow full tree, then post-prune
• Measure performance over training data (threshold)
• Statistical significance test whether expanding or pruning at node will improve beyond training set 2
• Measure performance over separate validation data set (utility of post-pruning) general cross-validation
• Use explicit measure for encoding complexity of tree, train MDL heuristics
How to select “best” tree:
![Page 59: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/59.jpg)
Machine Learning
Reduced-Error PruningReduced-Error Pruning
lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
1. Evaluate impact on validation set of pruning each possible node (plus those below it)
2. Greedily remove the one that most improves validation set accuracy
• produces smallest version of most accurate subtree• What if data is limited??
Split data into training and validation set
Do until further pruning is harmful:
![Page 60: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/60.jpg)
Machine Learning
Effect of Reduced-Error PruningEffect of Reduced-Error Pruning
lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
![Page 61: Chapter II.6 (Book Part VI) Learning](https://reader033.fdocuments.us/reader033/viewer/2022061204/546f8949af79599a0a8b45a4/html5/thumbnails/61.jpg)
Chapter 6.1 – Learning from Observation
Software that Customizes to UserSoftware that Customizes to User
Recommender systems(Amazon..)