1 Mining Decision Trees from Data Streams Thanks: Tong Suk Man Ivy HKU.
Incremental learning of decision trees from time-changing data streams
-
Upload
blaz-sovdat -
Category
Software
-
view
106 -
download
3
Transcript of Incremental learning of decision trees from time-changing data streams
![Page 1: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/1.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental decision tree learning fromtime-changing data streams
Blaz [email protected]
Artificial Intelligence Laboratory, Jozef Stefan Institute
October 15, 2013
![Page 2: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/2.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Talk outline
1 IntroductionMotivationClassical decision tree learning
2 Incremental decision tree learningIncremental classification tree learning
3 EvaluationAssessing learning performanceLearning algorithm comparison
4 ResultsData descriptionResultsPrequential fading error estimation
![Page 3: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/3.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 4: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/4.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 5: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/5.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 6: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/6.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 7: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/7.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 8: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/8.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Motivation
Motivation
In certain scenarios data arrive continuously and areunbounded (data streams)Sensor networks, search queries, road traffic, network trafficNo control over the order and speed of arrivalBecause of the limited working memory we may view eachexample only onceSource distribution may change over time (concept drift)Classical (batch) decision tree learning methods fail
![Page 9: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/9.jpg)
Classical decision tree learning
sex
status1
female
status2
male
yes1
first
yes2
second
age1
third
yes3
crew
age2
first
no1
second
age3
third
no2
crew
no3
adult
no4
child
no5
adult
yes4
child
no6
adult
no7
child
![Page 10: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/10.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 11: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/11.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 12: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/12.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 13: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/13.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 14: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/14.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 15: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/15.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 16: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/16.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Classical decision tree learning
The following ID3 learner is due to [Quinlan, 1986]Let S be a set of training examplesFind attribute A? that alone best classifies examples from S:
Define a heuristic measure, say information gain
G(A, S) := H(S)−d∑
i=1
|Si ||S| H(Si)
Then pick the best attribute:
A? = arg maxA∈A
G(A, S)
Partition S to Si := {x ∈ S : A?(x) = ai} for all values ai ofA?, and create leaf node for each partitionRecursively apply procedure on examples Si at children nodes
![Page 17: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/17.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 18: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/18.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 19: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/19.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 20: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/20.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 21: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/21.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 22: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/22.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 23: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/23.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 24: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/24.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Classical decision tree learning
Simple example
Example on the Titanic datasetList of all Titanic passengersEach passenger is represented as (status, age, sex)vector, labeled either yes (survived) or no (died)Attribute description:
status: first, second, third, or crew;age: adult, child;sex: male, female.
Learn to predict whether unlabeled x survived or died
![Page 25: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/25.jpg)
Simple example
no
![Page 26: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/26.jpg)
Simple example
sex
yes
female
no
male
![Page 27: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/27.jpg)
Simple example
sex
status1
female
no1
male
yes1
first
yes2
second
no2
third
yes3
crew
![Page 28: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/28.jpg)
Simple example
sex
status1
female
status2
male
yes1
first
yes2
second
no1
third
yes3
crew
no2
first
no3
second
no4
third
no5
crew
![Page 29: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/29.jpg)
Simple example
sex
status1
female
status2
male
yes1
first
yes2
second
age1
third
yes3
crew
no1
first
no2
second
no3
third
no4
crew
no5
adult
no6
child
![Page 30: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/30.jpg)
Simple example
sex
status1
female
status2
male
yes1
first
yes2
second
age1
third
yes3
crew
age2
first
no1
second
no2
third
no3
crew
no4
adult
no5
child
no6
adult
yes4
child
![Page 31: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/31.jpg)
Simple example
sex
status1
female
status2
male
yes1
first
yes2
second
age1
third
yes3
crew
age2
first
no1
second
age3
third
no2
crew
no3
adult
no4
child
no5
adult
yes4
child
no6
adult
no7
child
![Page 32: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/32.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
Incremental decision tree learning
In data stream world we only have a small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilitySuppose A1 and A2 are attributes with highest estimatesG(A1) and G(A2)
If G(A1)− G(A2) > ε, then A1 is truly best with probability atleast 1− δ for 1− δ ∈ (0, 1) and
ε =
√R2 log(1/δ)
2n
This is the main idea behind VFDT learner [Domingos andHulten, 2000]
![Page 33: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/33.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
Incremental decision tree learning
In data stream world we only have a small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilitySuppose A1 and A2 are attributes with highest estimatesG(A1) and G(A2)
If G(A1)− G(A2) > ε, then A1 is truly best with probability atleast 1− δ for 1− δ ∈ (0, 1) and
ε =
√R2 log(1/δ)
2n
This is the main idea behind VFDT learner [Domingos andHulten, 2000]
![Page 34: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/34.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
Incremental decision tree learning
In data stream world we only have a small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilitySuppose A1 and A2 are attributes with highest estimatesG(A1) and G(A2)
If G(A1)− G(A2) > ε, then A1 is truly best with probability atleast 1− δ for 1− δ ∈ (0, 1) and
ε =
√R2 log(1/δ)
2n
This is the main idea behind VFDT learner [Domingos andHulten, 2000]
![Page 35: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/35.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
Incremental decision tree learning
In data stream world we only have a small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilitySuppose A1 and A2 are attributes with highest estimatesG(A1) and G(A2)
If G(A1)− G(A2) > ε, then A1 is truly best with probability atleast 1− δ for 1− δ ∈ (0, 1) and
ε =
√R2 log(1/δ)
2n
This is the main idea behind VFDT learner [Domingos andHulten, 2000]
![Page 36: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/36.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
Incremental decision tree learning
In data stream world we only have a small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilitySuppose A1 and A2 are attributes with highest estimatesG(A1) and G(A2)
If G(A1)− G(A2) > ε, then A1 is truly best with probability atleast 1− δ for 1− δ ∈ (0, 1) and
ε =
√R2 log(1/δ)
2n
This is the main idea behind VFDT learner [Domingos andHulten, 2000]
![Page 37: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/37.jpg)
VFDT algorithm
1: Let HT be the root node2: for x ∈ S do3: Sort x down the tree to the leaf ` and update its sufficient statistic4: if n` mod nm = 0 and examples from ` have nonzero entropy then5: Let Xa and Xb be attributes with highest estimates G`(Xi)
6: Compute ε :=√
R2 log(1/δ)2n`
7: if G(Xa)− G(Xb) > ε or G(Xa)− G(Xb) ≤ ε < τ then8: Turn leaf ` into a node that tests on Xa9: for values of Xa do
10: Add a leaf and initialize its sufficient statistic11: end for12: end if13: end if14: end for
![Page 38: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/38.jpg)
VFDT algorithm
1: Let HT be the root node2: for x ∈ S do3: Sort x down the tree to the leaf ` and update its sufficient statistic4: if n` mod nm = 0 and examples from ` have nonzero entropy then5: Let Xa and Xb be attributes with highest estimates G`(Xi)
6: Compute ε :=√
R2 log(1/δ)2n`
7: if G(Xa)− G(Xb) > ε or G(Xa)− G(Xb) ≤ ε < τ then8: Turn leaf ` into a node that tests on Xa9: for values of Xa do
10: Add a leaf and initialize its sufficient statistic11: end for12: end if13: end if14: end for
![Page 39: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/39.jpg)
VFDT algorithm
1: Let HT be the root node2: for x ∈ S do3: Sort x down the tree to the leaf ` and update its sufficient statistic4: if n` mod nm = 0 and examples from ` have nonzero entropy then5: Let Xa and Xb be attributes with highest estimates G`(Xi)
6: Compute ε :=√
R2 log(1/δ)2n`
7: if G(Xa)− G(Xb) > ε or G(Xa)− G(Xb) ≤ ε < τ then8: Turn leaf ` into a node that tests on Xa9: for values of Xa do
10: Add a leaf and initialize its sufficient statistic11: end for12: end if13: end if14: end for
![Page 40: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/40.jpg)
VFDT algorithm
1: Let HT be the root node2: for x ∈ S do3: Sort x down the tree to the leaf ` and update its sufficient statistic4: if n` mod nm = 0 and examples from ` have nonzero entropy then5: Let Xa and Xb be attributes with highest estimates G`(Xi)
6: Compute ε :=√
R2 log(1/δ)2n`
{Here, R = log2 C}7: if G(Xa)− G(Xb) > ε or G(Xa)− G(Xb) ≤ ε < τ then8: Turn leaf ` into a node that tests on Xa9: for values of Xa do
10: Add a leaf and initialize its sufficient statistic11: end for12: end if13: end if14: end for
![Page 41: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/41.jpg)
VFDT algorithm
1: Let HT be the root node2: for x ∈ S do3: Sort x down the tree to the leaf ` and update its sufficient statistic4: if n` mod nm = 0 and examples from ` have nonzero entropy then5: Let Xa and Xb be attributes with highest estimates G`(Xi)
6: Compute ε :=√
R2 log(1/δ)2n`
7: if G(Xa)− G(Xb) > ε or G(Xa)− G(Xb) ≤ ε < τ then8: Turn leaf ` into a node that tests on Xa9: for values of Xa do
10: Add a leaf and initialize its sufficient statistic11: end for12: end if13: end if14: end for
![Page 42: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/42.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
VFDT algorithm
The algorithm doesn’t adapt to changesHandle numeric attributes with online discretizationWe introduced τ parameter to resolve cases when twoattributes are almost equally goodRecompute G(Ai) periodically (typically nm = 200)With high probability, VFDT-induced tree uses the samesequence of tests as (hypothetical) batch-induced tree toclassify a randomly chosen example [Domingos and Hulten,2000]
![Page 43: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/43.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
VFDT algorithm
The algorithm doesn’t adapt to changesHandle numeric attributes with online discretizationWe introduced τ parameter to resolve cases when twoattributes are almost equally goodRecompute G(Ai) periodically (typically nm = 200)With high probability, VFDT-induced tree uses the samesequence of tests as (hypothetical) batch-induced tree toclassify a randomly chosen example [Domingos and Hulten,2000]
![Page 44: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/44.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
VFDT algorithm
The algorithm doesn’t adapt to changesHandle numeric attributes with online discretizationWe introduced τ parameter to resolve cases when twoattributes are almost equally goodRecompute G(Ai) periodically (typically nm = 200)With high probability, VFDT-induced tree uses the samesequence of tests as (hypothetical) batch-induced tree toclassify a randomly chosen example [Domingos and Hulten,2000]
![Page 45: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/45.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
VFDT algorithm
The algorithm doesn’t adapt to changesHandle numeric attributes with online discretizationWe introduced τ parameter to resolve cases when twoattributes are almost equally goodRecompute G(Ai) periodically (typically nm = 200)With high probability, VFDT-induced tree uses the samesequence of tests as (hypothetical) batch-induced tree toclassify a randomly chosen example [Domingos and Hulten,2000]
![Page 46: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/46.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental classification tree learning
VFDT algorithm
The algorithm doesn’t adapt to changesHandle numeric attributes with online discretizationWe introduced τ parameter to resolve cases when twoattributes are almost equally goodRecompute G(Ai) periodically (typically nm = 200)With high probability, VFDT-induced tree uses the samesequence of tests as (hypothetical) batch-induced tree toclassify a randomly chosen example [Domingos and Hulten,2000]
![Page 47: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/47.jpg)
Big picture of the CVFDT algorithm
......
T
a1 ad
T’
Alternate trees grown by node T´Subtrees of the node T´
New examplesOld examplesSliding window W
Root node
![Page 48: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/48.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 49: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/49.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 50: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/50.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 51: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/51.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 52: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/52.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 53: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/53.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Assessing learning performance
Assessing learning performance
Roughly we distinguish two approaches [Gama et al., 2013]:Holdout error estimation
The idea is to periodically (period is, say, 20 000) sacrificem := 2 000 examples and use them to estimate classificationaccuracy: Hm := 1
m∑k+m
i=k L(yi , yi)
Prequential error estimation (also known as “test-then-train”)Let α ∈ (0, 1] be a fading factor and let A be a classifierDefine estimated prequential error Pα(i):
SαA (i) := L(yi , yi) + αL(yi−1, yi−1) + . . .+ αi−1L(y1, y1),
Nα(i) := 1 + α+ . . .+ αi ,
Pα(i) := SαA (i)/Nα(i).
![Page 54: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/54.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 55: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/55.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 56: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/56.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 57: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/57.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 58: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/58.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 59: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/59.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 60: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/60.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 61: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/61.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 62: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/62.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Learning algorithm comparison
Comparing learning algorithms
Let A and B learners and let SA and SB be aligned errorsequencesDefine Qα
i (A,B) := log (SαA(i)/SαB(i))Interpretation of Q-statistic:
Qαi (A,B) < 0 means that A is better than B,
Qαi (A,B) > 0 means that B is better than A,
Qαi (A,B) = 0 means A and B perform equally well.
Here |Qαi (A,B)| is strength of the difference — how much
better is one learner from the otherWilcoxon test tests the null hypothesis that the vector ofQ-statistics come from zero-median distributionFor all tests we took significance level α := 0.0001
![Page 63: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/63.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 64: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/64.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 65: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/65.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 66: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/66.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 67: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/67.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 68: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/68.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 69: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/69.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Data description
Data description
We evaluated VFDT and CVFDT learners onelectricity-demand data for New York stateWe discretize the target attribute load to get 5-classclassification problemOther attributes:
numeric attributes hourOfDay, dayOfWeek, month, computedfrom datename of area name is 11-valued discrete attribute, PTID isnumeric attribute
We took data for the last 10 years and tried to predictdemand for the next measurementTogether around 13 878 974 records (about 1.3GB ofuncompressed data)
![Page 70: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/70.jpg)
Load zones
A
B
NEW YORK CONTROL AREA LOAD ZONES
G
F
E
D
C
B
E
H
I
J K
A - WEST B - GENESE C - CENTRL D - NORTH E - MHK VL F - CAPITL G - HUD VL H - MILLWD I - DUNWOD J - N.Y.C. K - LONGIL
Figure : Taken from NYISO (http://www.nyiso.com/public/index.jsp).
![Page 71: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/71.jpg)
One month demand for a single area
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
600
800
1000
1200
1400
1600
1800
2000
Zaporedna stevilka mertive
Porabavtem
tren
utku
![Page 72: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/72.jpg)
One year demand for a single area
0 2 4 6 8 10 12
x 104
0
500
1000
1500
2000
2500
3000
3500
4000
Zaporedna stevilka mertive
Porabavtem
tren
utku
![Page 73: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/73.jpg)
Global demand for a single area
0 2 4 6 8 10 12 14
x 105
0
500
1000
1500
2000
2500
3000
3500
4000
Zaporedna stevilka mertive
Porabavtem
tren
utku
![Page 74: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/74.jpg)
Target variable distribution
0 1000 2000 3000 4000 5000 6000 7000 80000
2
4
6
8
10
12x 10
4
Poraba
Fre
kven
ca p
orab
e
![Page 75: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/75.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Results
Results
Method Learner A/Leraner B Median p-valueHoldout estimate VFDT-MAJ/CVFDT-MAJ µ1/2 = −0.4285 p < 0.0001Holdout estimate VFDT-NB/CVFDT-NB µ1/2 = 0 p = 0.6538Holdout estimate CVFDT-MAJ/CVFDT-NB µ1/2 = 0.4410 p < 0.0001
Fading factors VFDT-MAJ/CVFDT-MAJ µ1/2 = −0.377 p < 0.0001Fading factors VFDT-NB/CVFDT-NB µ1/2 = 0.0297 p = 0.1424Fading factors CVFDT-MAJ/CVFDT-NB µ1/2 = 0.3819 p < 0.0001
Table : Results of Wilcoxon test when testing hypothesis that the medianof Q-statistics is zero.
![Page 76: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/76.jpg)
CVFDT-MAJ versus CVFDT-NB
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Stevilka ucnega primera v toku
Bledecanapaka
CVFDT−MAJCVFDT−NB
![Page 77: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/77.jpg)
CVFDT-MAJ versus CVFDT-NB
0 200 400 600 800 1000 1200−8
−6
−4
−2
0
2
4
6
8
10
Stevilka ucnega primera v toku
Vrednost
Qstatistike
![Page 78: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/78.jpg)
VFDT-NB versus CVFDT-NB
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Stevilka ucnega primera v toku
Bledecanapaka
VFDT−NBCVFDT−NB
![Page 79: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/79.jpg)
VFDT-NB versus CVFDT-NB
0 200 400 600 800 1000 1200−15
−10
−5
0
5
10
Stevilka ucnega primera v toku
Vrednost
Qstatistike
![Page 80: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/80.jpg)
VFDT-MAJ versus CVFDT-MAJ
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Sevilka ucnega primera v toku
Bledecanapaka
CVFDT−MAJVFDT−MAJ
![Page 81: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/81.jpg)
VFDT-MAJ versus CVFDT-MAJ
0 200 400 600 800 1000 1200−12
−10
−8
−6
−4
−2
0
2
4
6
8
Stevilka ucnega primera v toku
Vrednost
Qstatistike
![Page 82: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/82.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
The End
The End
Thank you for your attention!
![Page 83: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/83.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Hoeffding’s inequality
Theorem ([Hoeffding, 1963])Let S := X1 + X2 + . . .+ Xn be sum of independent boundedrandom variables ai ≤ Xi ≤ bi and let ε > 0 be a positive realnumber. Then
P (S − E[S] ≥ nε) ≤ exp(−2n2ε2
/ n∑i=1
(bi − ai)2). (1)
![Page 84: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/84.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Hoeffding’s inequality
CorollaryLet S := X1 + X2 + . . .+ Xn be sum of independent boundedrandom variables a ≤ Xi ≤ b and let ε > 0 be a positive realnumber. For R := b − a we have
P (S − E[S] ≥ nε) ≤ exp(−2nε2/R2
). (2)
![Page 85: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/85.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 86: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/86.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 87: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/87.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 88: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/88.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 89: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/89.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 90: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/90.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Appendix
Incremental decision tree learning
In data stream world we only have small subset of examplesavailableUsing Hoeffding inequality we can find the truly best attributefrom sample with high probabilityLet a ≤ X ≤ b be a bounded random variable and letX1,X2, . . . ,Xn be its measurementsLet µ := (X1 + X2 + . . .+ Xn)/n be sample mean and letµ := E[X ] be the true meanFurthermore let 1− δ ∈ (0, 1) be the desired confidence levelBy Hoeffding’s inequality, we have P(µ ≥ µ− ε) ≥ 1− δ for
ε =
√(b − a)2 log(1/δ)
2n
![Page 91: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/91.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Done at JSI by Elena Ikonomovska [Ikonomovska, 2012]Regression trees predict real number instead of classDefine standard deviation reduction:
sdr(A,S) := σ(S)−d∑
i=1
|Si ||S| σ(Si),
where Si := {x ∈ S : A(x) = ai} and σ(S) denotes standarddeviationPick attribute that maximizes SDR: A? := arg max
A∈Asdr(A, S)
Again, using Hoeffding’s inequality, we can find the bestattribute with high probability
![Page 92: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/92.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Done at JSI by Elena Ikonomovska [Ikonomovska, 2012]Regression trees predict real number instead of classDefine standard deviation reduction:
sdr(A,S) := σ(S)−d∑
i=1
|Si ||S| σ(Si),
where Si := {x ∈ S : A(x) = ai} and σ(S) denotes standarddeviationPick attribute that maximizes SDR: A? := arg max
A∈Asdr(A, S)
Again, using Hoeffding’s inequality, we can find the bestattribute with high probability
![Page 93: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/93.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Done at JSI by Elena Ikonomovska [Ikonomovska, 2012]Regression trees predict real number instead of classDefine standard deviation reduction:
sdr(A,S) := σ(S)−d∑
i=1
|Si ||S| σ(Si),
where Si := {x ∈ S : A(x) = ai} and σ(S) denotes standarddeviationPick attribute that maximizes SDR: A? := arg max
A∈Asdr(A, S)
Again, using Hoeffding’s inequality, we can find the bestattribute with high probability
![Page 94: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/94.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Done at JSI by Elena Ikonomovska [Ikonomovska, 2012]Regression trees predict real number instead of classDefine standard deviation reduction:
sdr(A,S) := σ(S)−d∑
i=1
|Si ||S| σ(Si),
where Si := {x ∈ S : A(x) = ai} and σ(S) denotes standarddeviationPick attribute that maximizes SDR: A? := arg max
A∈Asdr(A, S)
Again, using Hoeffding’s inequality, we can find the bestattribute with high probability
![Page 95: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/95.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Done at JSI by Elena Ikonomovska [Ikonomovska, 2012]Regression trees predict real number instead of classDefine standard deviation reduction:
sdr(A,S) := σ(S)−d∑
i=1
|Si ||S| σ(Si),
where Si := {x ∈ S : A(x) = ai} and σ(S) denotes standarddeviationPick attribute that maximizes SDR: A? := arg max
A∈Asdr(A, S)
Again, using Hoeffding’s inequality, we can find the bestattribute with high probability
![Page 96: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/96.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Let A and B be the best and the second-best attributes,respectivelyThen r := sdr(A)/ sdr(B) is a random variable and r ∈ [0, 1]Let r1, r2, . . . , rn be such ratios for the last n examplesNow pick 1− δ ∈ (0, 1) and let
ε =
√log(1/δ)
2n
By Hoeffding’s inequality we have P(r ∈ [r − ε, r + ε]) ≥ 1− δfor r = (r1 + r2 + . . .+ rn)/n
![Page 97: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/97.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Let A and B be the best and the second-best attributes,respectivelyThen r := sdr(A)/ sdr(B) is a random variable and r ∈ [0, 1]Let r1, r2, . . . , rn be such ratios for the last n examplesNow pick 1− δ ∈ (0, 1) and let
ε =
√log(1/δ)
2n
By Hoeffding’s inequality we have P(r ∈ [r − ε, r + ε]) ≥ 1− δfor r = (r1 + r2 + . . .+ rn)/n
![Page 98: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/98.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Let A and B be the best and the second-best attributes,respectivelyThen r := sdr(A)/ sdr(B) is a random variable and r ∈ [0, 1]Let r1, r2, . . . , rn be such ratios for the last n examplesNow pick 1− δ ∈ (0, 1) and let
ε =
√log(1/δ)
2n
By Hoeffding’s inequality we have P(r ∈ [r − ε, r + ε]) ≥ 1− δfor r = (r1 + r2 + . . .+ rn)/n
![Page 99: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/99.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Let A and B be the best and the second-best attributes,respectivelyThen r := sdr(A)/ sdr(B) is a random variable and r ∈ [0, 1]Let r1, r2, . . . , rn be such ratios for the last n examplesNow pick 1− δ ∈ (0, 1) and let
ε =
√log(1/δ)
2n
By Hoeffding’s inequality we have P(r ∈ [r − ε, r + ε]) ≥ 1− δfor r = (r1 + r2 + . . .+ rn)/n
![Page 100: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/100.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Let A and B be the best and the second-best attributes,respectivelyThen r := sdr(A)/ sdr(B) is a random variable and r ∈ [0, 1]Let r1, r2, . . . , rn be such ratios for the last n examplesNow pick 1− δ ∈ (0, 1) and let
ε =
√log(1/δ)
2n
By Hoeffding’s inequality we have P(r ∈ [r − ε, r + ε]) ≥ 1− δfor r = (r1 + r2 + . . .+ rn)/n
![Page 101: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/101.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Now we can derive a split criterionLet SA and SB be deviation reduction after testing on A andB, respectivelyIf SB/SA < 1− ε, then A is truly best attribute withprobability at least 1− δ (see [Ikonomovska, 2012])When predicting target variable, sort example down the treeand return average of examples at given leaf
![Page 102: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/102.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Now we can derive a split criterionLet SA and SB be deviation reduction after testing on A andB, respectivelyIf SB/SA < 1− ε, then A is truly best attribute withprobability at least 1− δ (see [Ikonomovska, 2012])When predicting target variable, sort example down the treeand return average of examples at given leaf
![Page 103: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/103.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Now we can derive a split criterionLet SA and SB be deviation reduction after testing on A andB, respectivelyIf SB/SA < 1− ε, then A is truly best attribute withprobability at least 1− δ (see [Ikonomovska, 2012])When predicting target variable, sort example down the treeand return average of examples at given leaf
![Page 104: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/104.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
What about regression?
Now we can derive a split criterionLet SA and SB be deviation reduction after testing on A andB, respectivelyIf SB/SA < 1− ε, then A is truly best attribute withprobability at least 1− δ (see [Ikonomovska, 2012])When predicting target variable, sort example down the treeand return average of examples at given leaf
![Page 105: Incremental learning of decision trees from time-changing data streams](https://reader033.fdocuments.us/reader033/viewer/2022042701/55be3259bb61eb66498b4693/html5/thumbnails/105.jpg)
Introduction Incremental decision tree learning Evaluation Results Appendix References
Incremental regression tree learning
Pedro Domingos and Geoff Hulten. Mining high-speed datastreams. In Proceedings of the sixth ACM SIGKDD internationalconference on Knowledge discovery and data mining, KDD ’00,pages 71–80, New York, NY, USA, 2000. ACM. ISBN1-58113-233-6. doi: 10.1145/347090.347107. URLhttp://doi.acm.org/10.1145/347090.347107.
Joao Gama, Raquel Sebastiao, and Pedro Pereira Rodrigues. Onevaluating stream learning algorithms. Machine Learning, 90(3):317–346, 2013.
W. Hoeffding. Probability inequalities for sums of bounded randomvariables. Journal of the American Statistical Association, 58(301):13–30, 1963.
Elena Ikonomovska. Algoritmi za ucenje regresijskih dreves inansamblov iz spremenljivih podatkovnih tokov. PhD thesis,Mednarodna podiplomska sola Jozefa Stefana, 2012.
J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986.