Core Methods in Educational Data Mining HUDK4050 Fall 2014.
-
Upload
vanessa-dawson -
Category
Documents
-
view
216 -
download
0
Transcript of Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining
HUDK4050Fall 2014
Demo of using Java
Activity
Second task
• Break into *different* 3-4 person groups than last time
• No overlap allowed
Second task
• Let’s take a quick look at homework C2
Second task
• Make up features for Assignment C2
• You need to– Come up with a new feature– Justify how you can would it from the data set– Justify why it would work
I need a volunteer
I need a volunteer
• Your task is to write down the features suggested
• And the counts for thumbs up/thumbs down
Now…
• Each group needs to read their favorite feature to the class and justify it
• Who thinks this feature will improve prediction of off-task behavior?
• Who doesn’t?
• Thumbs up, thumbs down!
Questions or comments?
Special Request
• Bring a print-out of your Assignment C2 solution to class on the day it’s due– Next Tuesday
Textbook
Automated Feature Generation
• What are the advantages of automated feature generation, as compared to feature engineering?
• What are the disadvantages?
Automated Feature Selection
• What are the advantages of automated feature selection, as compared to having a domain expert decide? (as in Sao Pedro paper from Monday)
• What are the disadvantages?
A connection to make
A connection to make
• Correlation filtering
• Eliminating collinearity in statistics
• In this case, increasing interpretability and reducing over-fitting go together– At least to some positive degree
Outer-loop forward selection
• What are the advantages and disadvantages to doing this?
Knowledge Engineering
• What is knowledge engineering?
Knowledge Engineering
• What is the difference between knowledge engineering and EDM?
Knowledge Engineering
• What is the difference between good knowledge engineering and bad knowledge engineering?
Knowledge Engineering
• What is the difference between (good) knowledge engineering and EDM?
• What are the advantages and disadvantages of each?
How can they be integrated?
FCBF: What Variables will be kept? (Cutoff = 0.65)
• What variables emerge from this table?G H I J K L Predicted
G .7 .8 .8 .4 .3 .72H .8 .7 .6 .5 .38I .8 .3 .4 .82J .8 .1 .75K .5 .65L .42
Other questions, comments, concerns about textbook?
If you enjoyed today’s class…
• Next fall, I’ll be offering a Feature Engineering Design Studio course…
• Learn the feature engineering process in detail
• Create a model important to your research
• Submit a journal paper
Special Session
• Thursday 9/24 3pm-430pm, Grace Dodge Hall 545
• An Inappropriately Brief Introduction to Frequentist Statistics
What if you can’t attend?
• Email me; I will send you the slides
Should you attend?
• Not mandatory
• Not necessary if you’ve taken a stats class that covers topics like Z, F, and Chi-squared tests
Next Class
• Tuesday, September 29
• Advanced Detector Evaluation and Validation
• Baker, R.S. (2015) Big Data and Education. Ch. 2, V5, V6.• Rosenthal, R., Rosnow, R.L. (1991) Essentials of Behavioral
Research: Methods and Data Analysis, 2nd edition. Ch. 22: Meta-Analysis.
• Rupp, A.A., Gushta, M., Mislevy, R.J., Shaffer, D.W. (2010) Evidence-Centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments.The Journal of Technology, Learning, and Assessment, 8 (4), 4-47.
The End