Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Post on 13-Dec-2015

216 views 0 download

Tags:

Transcript of Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining

HUDK4050Fall 2014

Demo of using Java

Activity

Second task

• Break into *different* 3-4 person groups than last time

• No overlap allowed

Second task

• Let’s take a quick look at homework C2

Second task

• Make up features for Assignment C2

• You need to– Come up with a new feature– Justify how you can would it from the data set– Justify why it would work

I need a volunteer

I need a volunteer

• Your task is to write down the features suggested

• And the counts for thumbs up/thumbs down

Now…

• Each group needs to read their favorite feature to the class and justify it

• Who thinks this feature will improve prediction of off-task behavior?

• Who doesn’t?

• Thumbs up, thumbs down!

Questions or comments?

Special Request

• Bring a print-out of your Assignment C2 solution to class on the day it’s due– Next Tuesday

Textbook

Automated Feature Generation

• What are the advantages of automated feature generation, as compared to feature engineering?

• What are the disadvantages?

Automated Feature Selection

• What are the advantages of automated feature selection, as compared to having a domain expert decide? (as in Sao Pedro paper from Monday)

• What are the disadvantages?

A connection to make

A connection to make

• Correlation filtering

• Eliminating collinearity in statistics

• In this case, increasing interpretability and reducing over-fitting go together– At least to some positive degree

Outer-loop forward selection

• What are the advantages and disadvantages to doing this?

Knowledge Engineering

• What is knowledge engineering?

Knowledge Engineering

• What is the difference between knowledge engineering and EDM?

Knowledge Engineering

• What is the difference between good knowledge engineering and bad knowledge engineering?

Knowledge Engineering

• What is the difference between (good) knowledge engineering and EDM?

• What are the advantages and disadvantages of each?

How can they be integrated?

FCBF: What Variables will be kept? (Cutoff = 0.65)

• What variables emerge from this table?G H I J K L Predicted

G .7 .8 .8 .4 .3 .72H .8 .7 .6 .5 .38I .8 .3 .4 .82J .8 .1 .75K .5 .65L .42

Other questions, comments, concerns about textbook?

If you enjoyed today’s class…

• Next fall, I’ll be offering a Feature Engineering Design Studio course…

• Learn the feature engineering process in detail

• Create a model important to your research

• Submit a journal paper

Special Session

• Thursday 9/24 3pm-430pm, Grace Dodge Hall 545

• An Inappropriately Brief Introduction to Frequentist Statistics

What if you can’t attend?

• Email me; I will send you the slides

Should you attend?

• Not mandatory

• Not necessary if you’ve taken a stats class that covers topics like Z, F, and Chi-squared tests

Next Class

• Tuesday, September 29

• Advanced Detector Evaluation and Validation

• Baker, R.S. (2015) Big Data and Education. Ch. 2, V5, V6.• Rosenthal, R., Rosnow, R.L. (1991) Essentials of Behavioral

Research: Methods and Data Analysis, 2nd edition. Ch. 22: Meta-Analysis.

• Rupp, A.A., Gushta, M., Mislevy, R.J., Shaffer, D.W. (2010) Evidence-Centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments.The Journal of Technology, Learning, and Assessment, 8 (4), 4-47.

The End