Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2...
Transcript of Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2...
![Page 1: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/1.jpg)
Classifiers, Part 1
Week 1, video 3:
![Page 2: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/2.jpg)
Prediction
¨ Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables)
¨ Sometimes used to predict the future¨ Sometimes used to make inferences about the
present
![Page 3: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/3.jpg)
Classification
¨ There is something you want to predict (“the label”)¨ The thing you want to predict is categorical
¤ The answer is one of a set of categories, not a number
¤ CORRECT/WRONG (sometimes expressed as 0,1)n We’ll talk about this specific problem later in the course
within latent knowledge estimation¤ HELP REQUEST/WORKED EXAMPLE
REQUEST/ATTEMPT TO SOLVE¤ WILL DROP OUT/WON’T DROP OUT¤ WILL ENROLL IN MOOC A,B,C,D,E,F, or G
![Page 4: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/4.jpg)
Where do those labels come from?
¨ In-software performance¨ School records¨ Test data¨ Survey data¨ Field observations or video coding¨ Text replays
![Page 5: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/5.jpg)
Classification
¨ Associated with each label are a set of “features”, which maybe you can use to predict the label
Skill pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
![Page 6: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/6.jpg)
Classification
¨ The basic idea of a classifier is to determine which features, in which combination, can predict the label
Skill pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
![Page 7: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/7.jpg)
Classifiers
¨ There are hundreds of classification algorithms
¨ A good data mining package will have many implementations¤ RapidMiner¤ SAS Enterprise Miner¤ Weka¤ KEEL
![Page 8: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/8.jpg)
Classification
¨ Of course, usually there are more than 4 features
¨ And more than 7 actions/data points
![Page 9: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/9.jpg)
Domain-Specificity
¨ Specific algorithms work better for specific domains and problems
¨ We often have hunches for why that is
¨ But it’s more in the realm of “lore” than really “engineering”
![Page 10: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/10.jpg)
Some algorithms I find useful
¨ Step Regression¨ Logistic Regression¨ J48/C4.5 Decision Trees¨ JRip Decision Rules¨ K* Instance-Based Classifiers
¨ There are many others!
![Page 11: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/11.jpg)
Step Regression
¨ Not step-wise regression
¨ Used for binary classification (0,1)
![Page 12: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/12.jpg)
Step Regression
¨ Fits a linear regression function¤ (as discussed in previous class)¤ with an arbitrary cut-off
¨ Selects parameters
¨ Assigns a weight to each parameter¨ Computes a numerical value
¨ Then all values below 0.5 are treated as 0, and all values >= 0.5 are treated as 1
![Page 13: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/13.jpg)
Example
¨ Y= 0.5a + 0.7b – 0.2c + 0.4d + 0.3¨ Cut-off 0.5
a b c d Y
1 1 1 1
0 0 0 0
-1 -1 1 3
![Page 14: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/14.jpg)
Example
¨ Y= 0.5a + 0.7b – 0.2c + 0.4d + 0.3¨ Cut-off 0.5
a b c d Y
1 1 1 1 1
0 0 0 0
-1 -1 1 3
![Page 15: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/15.jpg)
Example
¨ Y= 0.5a + 0.7b – 0.2c + 0.4d + 0.3¨ Cut-off 0.5
a b c d Y
1 1 1 1 1
0 0 0 0 0
-1 -1 1 3
![Page 16: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/16.jpg)
Example
¨ Y= 0.5a + 0.7b – 0.2c + 0.4d + 0.3¨ Cut-off 0.5
a b c d Y
1 1 1 1 1
0 0 0 0 0
-1 -1 1 3 0
![Page 17: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/17.jpg)
Quiz
¨ Y= 0.5a + 0.7b – 0.2c + 0.4d + 0.3¨ Cut-off 0.5
a b c d Y
2 -1 0 1
![Page 18: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/18.jpg)
Note
¨ Step regression is used in RapidMiner by using linear regression with binary data
¨ Other functions in different packages
![Page 19: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/19.jpg)
Step regression: should you use it?
¨ Step regression is not preferred by statisticians due to lack of closed-form expression
¨ But often does better in EDM, due to lower over-fitting
![Page 20: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/20.jpg)
Logistic Regression
¨ Another algorithm for binary classification (0,1)
![Page 21: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/21.jpg)
Logistic Regression
¨ Given a specific set of values of predictor variables
¨ Fits logistic function to data to find out the frequency/odds of a specific value of the dependent variable
![Page 22: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/22.jpg)
Logistic Regression
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4
p(m)
![Page 23: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/23.jpg)
Logistic Regression
m = a0 + a1v1 + a2v2 + a3v3 + a4v4…
![Page 24: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/24.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.4C
![Page 25: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/25.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.4C
A B C M P(M)
0 0 0
![Page 26: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/26.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.4C
A B C M P(M)
0 0 0 0 0.5
![Page 27: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/27.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.5C
A B C M P(M)
1 1 1 1 0.73
![Page 28: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/28.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.5C
A B C M P(M)
-1 -1 -1 -1 0.27
![Page 29: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/29.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.5C
A B C M P(M)
2 2 2 2 0.88
![Page 30: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/30.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.5C
A B C M P(M)
3 3 3 3 0.95
![Page 31: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/31.jpg)
Logistic Regression
m = 0.2A + 0.3B + 0.5C
A B C M P(M)
50 50 50 50 ~1
![Page 32: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/32.jpg)
Relatively conservative
¨ Thanks to simple functional form, is a relatively conservative algorithm¤ I’ll explain this in more detail later in the course
![Page 33: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/33.jpg)
Good for
¨ Cases where changes in value of predictor variables have predictable effects on probability of predicted variable class
¨ m = 0.2A + 0.3B + 0.5C
¨ Higher A always leads to higher probability¤ But there are some data sets where this isn’t true!
![Page 34: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/34.jpg)
What about interaction effects?
¨ A = Bad
¨ B = Bad
¨ A+B = Good
![Page 35: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/35.jpg)
What about interaction effects?
¨ Ineffective Educational Software = Bad
¨ Off-Task Behavior = Bad
¨ Ineffective Educational Software PLUSOff-Task Behavior = Good
![Page 36: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/36.jpg)
Logistic and Step Regression are good when interactions are not particularly common
¨ Can be given interaction effects through automated feature distillation¤ We’ll discuss this later
¨ But is not particularly optimal for this
![Page 37: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/37.jpg)
What about interaction effects?
¨ Fast Responses + Material Student Already Knows -> Associated with Better Learning
¨ Fast Responses + Material Student Does not Know -> Associated with Worse Learning
![Page 38: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/38.jpg)
Decision Trees
¨ An approach that explicitly deals with interaction effects
![Page 39: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/39.jpg)
Decision Tree
KNOWLEDGE
TIME TOTALACTIONS
RIGHT RIGHTWRONG WRONG
<0.5 >=0.5
<6s. >=6s. <4 >=4
Skill knowledge time totalactions right?COMPUTESLOPE 0.544 9 1 ?
![Page 40: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/40.jpg)
Decision Tree
KNOWLEDGE
TIME TOTALACTIONS
RIGHT RIGHTWRONG WRONG
<0.5 >=0.5
<6s. >=6s. <4 >=4
Skill knowledge time totalactions right?COMPUTESLOPE 0.544 9 1 RIGHT
![Page 41: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/41.jpg)
Decision Tree
KNOWLEDGE
TIME TOTALACTIONS
RIGHT RIGHTWRONG WRONG
<0.5 >=0.5
<6s. >=6s. <4 >=4
Skill knowledge time totalactions right?COMPUTESLOPE 0.444 9 1 ?
![Page 42: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/42.jpg)
Decision Tree Algorithms
¨ There are several
¨ I usually use J48, which is an open-source re-implementation in Weka/RapidMiner of C4.5 (Quinlan, 1993)
![Page 43: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/43.jpg)
J48/C4.5
¨ Can handle both numerical and categorical predictor variables¤ Tries to find optimal split in numerical variables
¨ Repeatedly looks for variable which best splits the data in terms of predictive power for each variable
¨ Later prunes out branches that turn out to have low predictive power
¨ Note that different branches can have different features!
![Page 44: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/44.jpg)
Can be adjusted…
¨ To split based on more or less evidence
¨ To prune based on more or less predictive power
![Page 45: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/45.jpg)
Relatively conservative
¨ Thanks to pruning step, is a relatively conservative algorithm¤ We’ll discuss conservatism in a later class
![Page 46: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/46.jpg)
Good when data has natural splits
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
![Page 47: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/47.jpg)
Good when multi-level interactions are common
![Page 48: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/48.jpg)
Good when same construct can be arrived at in multiple ways
¨ A student is likely to drop out of college when he¤ Starts assignments early but lacks prerequisites
¨ OR when he¤ Starts assignments the day they’re due
![Page 49: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/49.jpg)
What variables should you use?
![Page 50: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/50.jpg)
What variables should you use?
¨ In one sense, the entire point of data mining is to figure out which variables matter
¨ But some variables have more construct validity or theoretical justification than others – using those variables generally leads to more generalizable models¤ We’ll talk more about this in a future lecture
![Page 51: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/51.jpg)
What variables should you use?
¨ In one sense, the entire point of data mining is to figure out which variables matter
¨ More urgently, some variables will make your model general only to the data set where they were trained¤ These should not be included in your model¤ They are typically the variables you want to test
generalizability across during cross-validationn More on this later
![Page 52: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/52.jpg)
Example
¨ Your model of student off-task behavior should not depend on which student you have
¨ “If student = BOB, and time > 80 seconds, then…”
¨ This model won’t be useful when you’re looking at totally new students
![Page 53: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/53.jpg)
Example
¨ Your model of student off-task behavior should not depend on which college the student is in
¨ “If school = University of Pennsylvania, and time > 80 seconds, then…”
¨ This model won’t be useful when you’re looking at data from new colleges
![Page 54: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/54.jpg)
Note
¨ In modern statistics, you often need to explicitly include these types of variables in models to conduct valid statistical testing
¨ This is a difference between classification and statistical modeling
¨ We’ll discuss it more in future lectures
![Page 55: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/55.jpg)
Later Lectures
¨ More classification algorithms
¨ Goodness metrics for comparing classifiers
¨ Validating classifiers for generalizability
¨ What does it mean for a classifier to be conservative?
![Page 56: Week 1, video 3: Classifiers, Part 1 · enteringgiven 0.704 9 1 wrong enteringgiven 0.502 10 2 right usediffnum 0.049 6 1 wrong enteringgiven 0.967 7 3 right removecoeff 0.792 16](https://reader034.fdocuments.us/reader034/viewer/2022042222/5ec9020bbd6dd9683e16016a/html5/thumbnails/56.jpg)
Next Lecture
¨ Building regressors and classifiers in RapidMiner