Download - Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Transcript
Page 1: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining

HUDK4050Fall 2014

Page 2: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Demo of using Java

Page 3: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Activity

Page 4: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Break into *different* 3-4 person groups than last time

• No overlap allowed

Page 5: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Let’s take a quick look at homework C2

Page 6: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Second task

• Make up features for Assignment C2

• You need to– Come up with a new feature– Justify how you can would it from the data set– Justify why it would work

Page 7: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

I need a volunteer

Page 8: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

I need a volunteer

• Your task is to write down the features suggested

• And the counts for thumbs up/thumbs down

Page 9: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Now…

• Each group needs to read their favorite feature to the class and justify it

• Who thinks this feature will improve prediction of off-task behavior?

• Who doesn’t?

• Thumbs up, thumbs down!

Page 10: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Questions or comments?

Page 11: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Special Request

• Bring a print-out of your Assignment C2 solution to class on the day it’s due– Next Tuesday

Page 12: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Textbook

Page 13: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Automated Feature Generation

• What are the advantages of automated feature generation, as compared to feature engineering?

• What are the disadvantages?

Page 14: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Automated Feature Selection

• What are the advantages of automated feature selection, as compared to having a domain expert decide? (as in Sao Pedro paper from Monday)

• What are the disadvantages?

Page 15: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

A connection to make

Page 16: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

A connection to make

• Correlation filtering

• Eliminating collinearity in statistics

• In this case, increasing interpretability and reducing over-fitting go together– At least to some positive degree

Page 17: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Outer-loop forward selection

• What are the advantages and disadvantages to doing this?

Page 18: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is knowledge engineering?

Page 19: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between knowledge engineering and EDM?

Page 20: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between good knowledge engineering and bad knowledge engineering?

Page 21: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Knowledge Engineering

• What is the difference between (good) knowledge engineering and EDM?

• What are the advantages and disadvantages of each?

Page 22: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

How can they be integrated?

Page 23: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

FCBF: What Variables will be kept? (Cutoff = 0.65)

• What variables emerge from this table?G H I J K L Predicted

G .7 .8 .8 .4 .3 .72H .8 .7 .6 .5 .38I .8 .3 .4 .82J .8 .1 .75K .5 .65L .42

Page 24: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Other questions, comments, concerns about textbook?

Page 25: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

If you enjoyed today’s class…

• Next fall, I’ll be offering a Feature Engineering Design Studio course…

• Learn the feature engineering process in detail

• Create a model important to your research

• Submit a journal paper

Page 26: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Special Session

• Thursday 9/24 3pm-430pm, Grace Dodge Hall 545

• An Inappropriately Brief Introduction to Frequentist Statistics

Page 27: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

What if you can’t attend?

• Email me; I will send you the slides

Page 28: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Should you attend?

• Not mandatory

• Not necessary if you’ve taken a stats class that covers topics like Z, F, and Chi-squared tests

Page 29: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Next Class

• Tuesday, September 29

• Advanced Detector Evaluation and Validation

• Baker, R.S. (2015) Big Data and Education. Ch. 2, V5, V6.• Rosenthal, R., Rosnow, R.L. (1991) Essentials of Behavioral

Research: Methods and Data Analysis, 2nd edition. Ch. 22: Meta-Analysis.

• Rupp, A.A., Gushta, M., Mislevy, R.J., Shaffer, D.W. (2010) Evidence-Centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments.The Journal of Technology, Learning, and Assessment, 8 (4), 4-47.

Page 30: Core Methods in Educational Data Mining HUDK4050 Fall 2014.

The End