Post on 14-Jan-2016
Learning with AdaBoost Learning with AdaBoost
Fall 2007
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 22
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 33
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 44
Boosting AlgorithmBoosting Algorithm Definition of Boosting[1]:Definition of Boosting[1]:
Boosting refers to a general method of producing a very Boosting refers to a general method of producing a very accurate prediction rule by combining rough and accurate prediction rule by combining rough and moderately inaccurate rules-of-thumb.moderately inaccurate rules-of-thumb.
Boosting procedures[2]Boosting procedures[2] Given a set of labeled training examples Given a set of labeled training examples
,where is the label associated with instance ,where is the label associated with instance On each round On each round , ,
• The booster devises a distribution (importance) over the example The booster devises a distribution (importance) over the example setset
• The booster requests a weak hypothesis (rule-of-thumb) with low The booster requests a weak hypothesis (rule-of-thumb) with low error error
After After TT rounds, the booster combine the weak hypothesis into a rounds, the booster combine the weak hypothesis into a single prediction rule.single prediction rule.
Niyx ii 1, iy ix
Tt ,,1
tht
tD
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 55
Boosting Algorithm(cont’d)Boosting Algorithm(cont’d)
The intuitive ideaThe intuitive idea
Altering the distribution over the domain in a way that Altering the distribution over the domain in a way that increases the probability of the “harder” parts of the increases the probability of the “harder” parts of the space, thus forcing the weak learner to generate new space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts.hypotheses that make less mistakes on these parts.
DisadvantagesDisadvantages Needs to know the prior knowledge of accuracies Needs to know the prior knowledge of accuracies
of the weak hypothesesof the weak hypotheses The performance bounds depends only on the The performance bounds depends only on the
accuracy of the least accurate weak hypothesisaccuracy of the least accurate weak hypothesis
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 66
background of Adaboost[2]background of Adaboost[2]
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 77
Adaboost Algorithm[2]Adaboost Algorithm[2]
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 88
Advantages of AdaboostAdvantages of Adaboost
Adaboost adjusts adaptively the errors of Adaboost adjusts adaptively the errors of the weak hypotheses by the weak hypotheses by WeakLearn.WeakLearn.
Unlike the conventional boosting Unlike the conventional boosting algorithm, the prior error need not be algorithm, the prior error need not be known ahead of time.known ahead of time.
The update rule reduces the probability The update rule reduces the probability assigned to those examples on which the assigned to those examples on which the hypothesis makes a good predictions and hypothesis makes a good predictions and increases the probability of the examples increases the probability of the examples on which the prediction is poor.on which the prediction is poor.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 99
The error bound[3]The error bound[3] Suppose the weak learning algorithm Suppose the weak learning algorithm WeakLearnWeakLearn, when , when
called by Adaboost, generates hypotheses with errors called by Adaboost, generates hypotheses with errors . Then the error of the final hypothesis . Then the error of the final hypothesis output by Adaboost is bounded above byoutput by Adaboost is bounded above by
Note that the errors generated by Note that the errors generated by WeakLearnWeakLearn are not are not uniform, and the final error depends on the error of all of uniform, and the final error depends on the error of all of the weak hypotheses. Recall that the errors of the the weak hypotheses. Recall that the errors of the previous boosting algorithms depend only on the maximal previous boosting algorithms depend only on the maximal error of the weakest hypothesis and ignored the error of the weakest hypothesis and ignored the advantages that can be gained from the hypotheses advantages that can be gained from the hypotheses whose errors are smaller. whose errors are smaller.
T ,,1
iifDi yxh ~Prfh
T
ttt
1
12
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1010
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1111
A toy example[2]A toy example[2]
Training set: 10 points Training set: 10 points (represented by plus or minus)(represented by plus or minus)
Original Status: Equal Original Status: Equal Weights for all training Weights for all training
samplessamples
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1212
A toy example(cont’d)A toy example(cont’d)
Round 1: Three “plus” points are not correctly classified;Round 1: Three “plus” points are not correctly classified;They are given higher weights.They are given higher weights.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1313
A toy example(cont’d)A toy example(cont’d)
Round 2: Three “minuse” points are not correctly classified;Round 2: Three “minuse” points are not correctly classified;They are given higher weights.They are given higher weights.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1414
A toy example(cont’d)A toy example(cont’d)
Round 3: One “minuse” and two “plus” points are not Round 3: One “minuse” and two “plus” points are not correctly classified;correctly classified;
They are given higher weights.They are given higher weights.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1515
A toy example(cont’d)A toy example(cont’d)
Final Classifier: integrate the three “weak” classifiers and Final Classifier: integrate the three “weak” classifiers and obtain a final strong classifier.obtain a final strong classifier.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1616
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1717
Look at Adaboost[3] AgainLook at Adaboost[3] Again
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1818
Adaboost(Con’d):Adaboost(Con’d):Multi-class ExtensionsMulti-class Extensions
The previous discussion is restricted to The previous discussion is restricted to binary classification problems. The set binary classification problems. The set YY could have any number of labels, which is could have any number of labels, which is a multi-class problems.a multi-class problems.
The multi-class case (AdaBoost.M1) The multi-class case (AdaBoost.M1) requires the accuracy of the weak requires the accuracy of the weak hypothesis greater than ½. This condition hypothesis greater than ½. This condition in the multi-class is stronger than that in in the multi-class is stronger than that in the binary classification casesthe binary classification cases
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 1919
AdaBoost.M1AdaBoost.M1
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2020
Error Upper Bound of Error Upper Bound of Adaboost.M1[3]Adaboost.M1[3]
Like the binary classification case, the error of Like the binary classification case, the error of the final hypothesis is also bounded.the final hypothesis is also bounded.
T
ttt
1
12
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2121
How does Adaboost.M1 work[4]?How does Adaboost.M1 work[4]?
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2222
Adaboost in our projectAdaboost in our project
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2323
Adaboost in our projectAdaboost in our project
1) The initialization has set the total weights of 1) The initialization has set the total weights of
target class the same as all other staff.target class the same as all other staff.
bird[1,…,10] = ½ * 1/10;bird[1,…,10] = ½ * 1/10;
otherstaff[1,…,690] = ½ * 1/690;otherstaff[1,…,690] = ½ * 1/690; 2) The history record is preserved to strengthen the 2) The history record is preserved to strengthen the
updating process of the weights.updating process of the weights. 3) the unified model obtained from CPM alignment are 3) the unified model obtained from CPM alignment are
used for training process.used for training process.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2424
Adaboost in our projectAdaboost in our project
2) The history record2) The history record
weight_histogram(withweight_histogram(withHistory Record)History Record)
weight_histogram(weight_histogram(without History without History
Record)Record)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2525
Adaboost in our projectAdaboost in our project
3) the unified model obtained from CPM alignment 3) the unified model obtained from CPM alignment are used for training process. This has are used for training process. This has decreased the overfitting problem.decreased the overfitting problem.
3.1) Overfitting Problem.3.1) Overfitting Problem.
3.2) CPM model.3.2) CPM model.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2626
Adaboost in our projectAdaboost in our project 3.1) Overfitting Problem. 3.1) Overfitting Problem.
Why the trained Adaboost does not work for bird 11~20?Why the trained Adaboost does not work for bird 11~20?I have compared: I have compared:
I ) the rank of alpha value for each 60 classifiers I ) the rank of alpha value for each 60 classifiers II) how each classifier has actually detected birds in train processII) how each classifier has actually detected birds in train processIII) how each classifier has actually detected birds in test process.III) how each classifier has actually detected birds in test process.
The covariance is also computed for comparison:The covariance is also computed for comparison:cov(c(:,1),c(:,2))cov(c(:,1),c(:,2))
ans = 305.0000 6.4746ans = 305.0000 6.4746 6.4746 305.00006.4746 305.0000
K>> cov(c(:,1),c(:,3))K>> cov(c(:,1),c(:,3))
ans = 305.0000 92.8644ans = 305.0000 92.8644 92.8644 305.000092.8644 305.0000
K>> cov(c(:,2),c(:,3))K>> cov(c(:,2),c(:,3))
ans = 305.0000 -46.1186ans = 305.0000 -46.1186 -46.1186 305.0000-46.1186 305.0000
Overfitted!Overfitted!
Train data is different Train data is different from test data. This is from test data. This is
very common.very common.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2727
Adaboost in our projectAdaboost in our project
Train ResultTrain Result(Covariance:6.4746)(Covariance:6.4746)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2828
Adaboost in our projectAdaboost in our project
Comparison:Train&Test ResultComparison:Train&Test Result(Covariance:92.8644)(Covariance:92.8644)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 2929
Adaboost in our projectAdaboost in our project
3.2) CPM: continuous profile model; put forward 3.2) CPM: continuous profile model; put forward by Jennifer Listgarten. This is very useful for by Jennifer Listgarten. This is very useful for data alignment.data alignment.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3030
Adaboost in our projectAdaboost in our project
The alignment results from CPM model:The alignment results from CPM model:
0 50 100 150 200 250-2
-1
0
1
2Aligned and Scaled Data
Latent Time
0 10 20 30 40 50 60 70 80 90 100-2
-1
0
1
2Unaligned and Unscaled Data
Experimental Time
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3131
Adaboost in our projectAdaboost in our project
The unified model from CPM alignment:The unified model from CPM alignment:
0 50 100 150 200 250-1
-0.5
0
0.5
1
1.5
2
0 10 20 30 40 50 60 70 80 90 100-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
without resampledwithout resampled after upsample after upsample and downsampleand downsample
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3232
Adaboost in our projectAdaboost in our project
The influence of CPM for history recordThe influence of CPM for history record
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8History Record(using CPM Alignment)
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1History Record(without CPM Alignment)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3333
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3434
Browse all birdsBrowse all birds
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3535
Curvature DescriptorCurvature Descriptor
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3636
Distance DescriptorDistance Descriptor
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3737
Adaboost without CPMAdaboost without CPM
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3838
Adaboost without CPM(con’d)Adaboost without CPM(con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 3939
Good_Part_SelectedGood_Part_Selected(Adaboost without CPM con’d)(Adaboost without CPM con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4040
Adaboost without CPM(con’d)Adaboost without CPM(con’d) The Alpha ValuesThe Alpha Values
Other Statistical Data: zero rate: 0.5333; Other Statistical Data: zero rate: 0.5333;
covariance: 0.0074; median: 0.0874covariance: 0.0074; median: 0.0874
0.075527 0 0.080877 0.168358 0 0
0 0 0.146951 0.007721 0.218146 0
0.081063 0 0 0.060681 0 0
0.197824 0 0.08873 0 0.080742 0.015646
0 0.080659 0.269843 0 0.028159 0
0 0.19772 0.086019 0.217678 0 0.21836
0 0.080554 0 0 0 0.190074
0 0.21237 0 0 0 0
0 0.060744 0 0 0 0
0.179449 0.338801 0.080667 0.080895 0 0.267993
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4141
Adaboost with CPMAdaboost with CPM
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4242
Adaboost with CPM(con’d)Adaboost with CPM(con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4343
Adaboost with CPM(con’d)Adaboost with CPM(con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4444
Good_Part_SelectedGood_Part_Selected(Adaboost without CPM con’d)(Adaboost without CPM con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4545
Adaboost without CPM(con’d)Adaboost without CPM(con’d) The Alpha ValuesThe Alpha Values
Other Statistical Data: zero rate: 0.6167; Other Statistical Data: zero rate: 0.6167; covariance: 0.9488; median: 1.6468covariance: 0.9488; median: 1.6468
2.521895 0 2.510827 0.714297 0 0
1.646754 0 0 0 0 0
2.134926 0 2.167948 0 2.526712 0
0.279277 0 0 0 0.0635 2.322823
0 0 2.516785 0 0 0
0 0.04174 0 0.207436 0 0
0 0 1.30396 0 0 0.951666
0 2.513161 2.530245 0 0 0
0 0 0 0.041627 2.522551 0
0.72565 0 2.506505 1.303823 0 1.611553
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4646
OutlineOutline
Introduction and background of Boosting Introduction and background of Boosting and Adaboostand Adaboost
Adaboost Algorithm exampleAdaboost Algorithm example Adaboost Algorithm in current projectAdaboost Algorithm in current project Experiment resultsExperiment results Discussion and conclusionDiscussion and conclusion
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4747
Conclusion and discussionConclusion and discussion
1) Adaboost works with CPM unified model; 1) Adaboost works with CPM unified model;
This model has smoothed the trained data set This model has smoothed the trained data set and decreased the influence of overfitting.and decreased the influence of overfitting.
2) The influence of history record is very 2) The influence of history record is very interesting. It will suppress the noise and interesting. It will suppress the noise and strengthen the strengthen the WeakLearnWeakLearn boosting direction. boosting direction.
3) The step length of KNN selected by Adaboost 3) The step length of KNN selected by Adaboost is not discussed here. This is also useful for is not discussed here. This is also useful for suppress noise.suppress noise.
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4848
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
4) The Adaboost does not rely on the trained order.4) The Adaboost does not rely on the trained order.The obtained Alpha value has very similar distribution for all the classifiers.The obtained Alpha value has very similar distribution for all the classifiers.There are two examples:There are two examples:Example 1: four different train orders have obtained the Alpha as follow:Example 1: four different train orders have obtained the Alpha as follow: 1) 6 birds1) 6 birds
Alpha_All1=Alpha_All1= 0.4480 0.1387 0.2074 0.5949 0.5868 0.3947 0.38740.4480 0.1387 0.2074 0.5949 0.5868 0.3947 0.3874
0.5634 0.6694 0.7447 0.5634 0.6694 0.74472) 6 birds2) 6 birdsAlpha_All2=Alpha_All2=
0.3998 0.0635 0.2479 0.6873 0.5868 0.2998 0.43200.3998 0.0635 0.2479 0.6873 0.5868 0.2998 0.4320 0.5581 0.6946 0.7652 0.5581 0.6946 0.76523) 6 birds3) 6 birdsAlpha_All3 = 0.4191 0.1301 0.2513 0.5988 0.5868 0.2920 0.4286Alpha_All3 = 0.4191 0.1301 0.2513 0.5988 0.5868 0.2920 0.4286 0.5503 0.6968 0.7134 0.5503 0.6968 0.71344) 6 birds4) 6 birdsAlpha_All4=Alpha_All4=
0.4506 0.0618 0.2750 0.5777 0.5701 0.3289 0.59480.4506 0.0618 0.2750 0.5777 0.5701 0.3289 0.5948 0.5857 0.7016 0.6212 0.5857 0.7016 0.6212
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 4949
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5050
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
Example 2: 60 parts from Curvature Descriptor, Example 2: 60 parts from Curvature Descriptor, 60 from Distance Descriptor;60 from Distance Descriptor;
1) They are trained independently at first; 1) They are trained independently at first;
2) Then they are combined to be trained 2) Then they are combined to be trained together. together.
The results are as follow:The results are as follow:
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5151
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5252
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
5) how to combine the curvature and distance 5) how to combine the curvature and distance descriptor will be another important problem. descriptor will be another important problem. Currently I can obtain nice results by combining Currently I can obtain nice results by combining them. 10 birds are all found.them. 10 birds are all found.
Are they stable for all other class? How to Are they stable for all other class? How to integrate the improved Adaboost to combine the integrate the improved Adaboost to combine the two descriptors? Maybe Adaboost will improve two descriptors? Maybe Adaboost will improve even further (for general stuff, for example, even further (for general stuff, for example, elephant or camel).elephant or camel).
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5353
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
Current results without Adaboost:Current results without Adaboost:
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5454
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
6) How about the influence from the search order?6) How about the influence from the search order?
Could we try to reverse the search order? Could we try to reverse the search order?
My current result has improved by one more bird, but not My current result has improved by one more bird, but not too much.too much.
7) How many models could we obtain from the CPM 7) How many models could we obtain from the CPM model? model?
Currently I am using only one unified model.Currently I am using only one unified model.
8) Why does the rescaled model not work? 8) Why does the rescaled model not work?
(I do not think curvature is so sensitive to the rescale).(I do not think curvature is so sensitive to the rescale).
9) Could we try to boosting the Neural Network?9) Could we try to boosting the Neural Network?
??
??
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5555
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
10) Could we try to change the boosting function?10) Could we try to change the boosting function?Currently I am using the Currently I am using the Logistical Regression Logistical Regression
projection function to transmit the error projection function to transmit the error information to Alpha value; anyway, there are information to Alpha value; anyway, there are many methods to do this work. For many methods to do this work. For example:c45, decision stump, decision table, example:c45, decision stump, decision table, naïve bayes, voted perceptron and zeroR. etc. naïve bayes, voted perceptron and zeroR. etc.
11) How to use decision tree to replace 11) How to use decision tree to replace Adaboost? I think this will impede the search Adaboost? I think this will impede the search speed; but I am not sure the quality. speed; but I am not sure the quality.
??
??
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5656
Conclusion and Conclusion and discussion(con’d)discussion(con’d)
12) How about the fuzzy SVM or SVM to 12) How about the fuzzy SVM or SVM to address this good parts selection problem?address this good parts selection problem?
13) How to understand the difference among good 13) How to understand the difference among good parts selected by computer and by human? parts selected by computer and by human?
(Do the parts from computer program have the (Do the parts from computer program have the similar semantic meaning?)similar semantic meaning?)
14) How about the stability of Curvature and 14) How about the stability of Curvature and Distance Descriptors?Distance Descriptors?
??
??
??
Thanks!Thanks!
04/21/2304/21/23 Learning with AdaboostLearning with Adaboost 5858
ReferenceReference
[1] Yoav Freund, Robert Schapire, a short [1] Yoav Freund, Robert Schapire, a short Introduction to BoostingIntroduction to Boosting
[2] Robert Schapire, the boosting approach to [2] Robert Schapire, the boosting approach to machine learning; Princeton Universitymachine learning; Princeton University
[3] Yoav Freund, Robert Schapire, A decision-[3] Yoav Freund, Robert Schapire, A decision-theoretic generalization of on-line learning and theoretic generalization of on-line learning and application to boostingapplication to boosting
[4] R. Polikar, Ensemble Based Systems in [4] R. Polikar, Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Decision Making, IEEE Circuits and Systems Magazine, vol.6, no.3, pp. 21-45, 2006. Magazine, vol.6, no.3, pp. 21-45, 2006.