Core Methods in Educational Data Mining HUDK4050 Fall 2014.

30
Core Methods in Educational Data Mining HUDK4050 Fall 2014

Transcript of Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Page 1: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining

HUDK4050Fall 2014

Page 2: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Demo of using Java

Page 3: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Activity

Page 4: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Break into *different* 3-4 person groups than last time

• No overlap allowed

Page 5: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Let’s take a quick look at homework C2

Page 6: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Make up features for Assignment C2

• You need to– Come up with a new feature– Justify how you can would it from the data set– Justify why it would work

Page 7: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

I need a volunteer

Page 8: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

I need a volunteer

• Your task is to write down the features suggested

• And the counts for thumbs up/thumbs down

Page 9: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Now…

• Each group needs to read their favorite feature to the class and justify it

• Who thinks this feature will improve prediction of off-task behavior?

• Who doesn’t?

• Thumbs up, thumbs down!

Page 10: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Questions or comments?

Page 11: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Special Request

• Bring a print-out of your Assignment C2 solution to class on the day it’s due– Next Tuesday

Page 12: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Textbook

Page 13: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Automated Feature Generation

• What are the advantages of automated feature generation, as compared to feature engineering?

• What are the disadvantages?

Page 14: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Automated Feature Selection

• What are the advantages of automated feature selection, as compared to having a domain expert decide? (as in Sao Pedro paper from Monday)

• What are the disadvantages?

Page 15: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

A connection to make

Page 16: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

A connection to make

• Correlation filtering

• Eliminating collinearity in statistics

• In this case, increasing interpretability and reducing over-fitting go together– At least to some positive degree

Page 17: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Outer-loop forward selection

• What are the advantages and disadvantages to doing this?

Page 18: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is knowledge engineering?

Page 19: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between knowledge engineering and EDM?

Page 20: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between good knowledge engineering and bad knowledge engineering?

Page 21: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between (good) knowledge engineering and EDM?

• What are the advantages and disadvantages of each?

Page 22: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

How can they be integrated?

Page 23: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

FCBF: What Variables will be kept? (Cutoff = 0.65)

• What variables emerge from this table?G H I J K L Predicted

G .7 .8 .8 .4 .3 .72H .8 .7 .6 .5 .38I .8 .3 .4 .82J .8 .1 .75K .5 .65L .42

Page 24: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Other questions, comments, concerns about textbook?

Page 25: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

If you enjoyed today’s class…

• Next fall, I’ll be offering a Feature Engineering Design Studio course…

• Learn the feature engineering process in detail

• Create a model important to your research

• Submit a journal paper

Page 26: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Special Session

• Thursday 9/24 3pm-430pm, Grace Dodge Hall 545

• An Inappropriately Brief Introduction to Frequentist Statistics

Page 27: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

What if you can’t attend?

• Email me; I will send you the slides

Page 28: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Should you attend?

• Not mandatory

• Not necessary if you’ve taken a stats class that covers topics like Z, F, and Chi-squared tests

Page 29: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Next Class

• Tuesday, September 29

• Advanced Detector Evaluation and Validation

• Baker, R.S. (2015) Big Data and Education. Ch. 2, V5, V6.• Rosenthal, R., Rosnow, R.L. (1991) Essentials of Behavioral

Research: Methods and Data Analysis, 2nd edition. Ch. 22: Meta-Analysis.

• Rupp, A.A., Gushta, M., Mislevy, R.J., Shaffer, D.W. (2010) Evidence-Centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments.The Journal of Technology, Learning, and Assessment, 8 (4), 4-47.

Page 30: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

The End