Feature Engineering Studio Special Session January 26, 2015.

19
Feature Engineering Studio Special Session January 26, 2015

Transcript of Feature Engineering Studio Special Session January 26, 2015.

Page 1: Feature Engineering Studio Special Session January 26, 2015.

Feature Engineering StudioSpecial Session

January 26, 2015

Page 2: Feature Engineering Studio Special Session January 26, 2015.

Assignment One

• Problem Proposal– Due next Monday

• Be ready to talk for 3 minutes on:– A data set

• Give where it came from and how big it is• You need to already have this data set, or be able to acquire it in the

next two weeks

– A prediction (or other statistical) model you will build in this data set

– What variable will you predict?– What kind of variables will you use to predict it?– Why is this worth doing?

Page 3: Feature Engineering Studio Special Session January 26, 2015.

Example (Pardos et al., 2014)

• Data set– ASSISTments system, formative assessment and

learning software for math used by 60k students a year (Razzaq et al., 2007)

– 810,000 data points from 229 students studied– Student actions in the software have been overlaid

with synchronized field codes of student affect (boredom, frustration, etc.)• 3075 field codes• Each field code connects to 20 seconds of log file actions

Page 4: Feature Engineering Studio Special Session January 26, 2015.

Example (Pardos et al., 2014)

• We will predict whether a student is bored at a specific time– So that we can replicate the human judgments

without needing a field observer

• We will predict this from what was going on in the log files at the time the field observation was made– We know every student action’s correctness, timing,

relevant skill, and probability they knew the skill

Page 5: Feature Engineering Studio Special Session January 26, 2015.

Example (Pardos et al., 2014)

• This is worth doing because boredom is known to predict student learning (Craig et al., 2004; Rodrigo et al., 2009; Pekrun et al., 2010)

• And building a detector will help us study boredom more thoroughly

• As well as enabling us to intervene on boredom in real time

Page 6: Feature Engineering Studio Special Session January 26, 2015.

Important Considerations

• Is the problem genuinely important? (usable or publishable)

• Is there a good measure of ground truth? (the variable you want to predict)

• Do we have rich enough data to distill meaningful features?

• Is there enough data to be able to take advantage of data mining?

Page 7: Feature Engineering Studio Special Session January 26, 2015.

What concerns you?

• Data set• What variable will you predict?• What kind of variables will you use to predict

it?

Page 8: Feature Engineering Studio Special Session January 26, 2015.

Data Set

• Who here has a data set, but has concerns about it?

• Who here doesn’t have a data set?

Page 9: Feature Engineering Studio Special Session January 26, 2015.

Data Set

• Who here has a data set, but has concerns about it?

• Who here doesn’t have a data set?

Page 10: Feature Engineering Studio Special Session January 26, 2015.

Data Set

• Who here has a data set, but has concerns about it?

• Who here doesn’t have a data set?

Page 11: Feature Engineering Studio Special Session January 26, 2015.

Online Learning

• ASSISTments (Neil Heffernan)• Genetics Tutor (Albert Corbett)• Impulse (Elizabeth Rowe)• Inq-ITS (Janice Gobert)• Physics Playground/Newton’s Playground (Val Shute)• Refraction (Taylor Martin) • Mathemantics (Herb Ginsburg)• Vialogues (Gary Natriello, Hui Soo Chae)• Project LISTEN (Jack Mostow)• TC3-Sim (Robert Sottilare)

Page 12: Feature Engineering Studio Special Session January 26, 2015.

Online Learning

• Big Data and Education (me)• Data, Analytics, and Learning (me)• SQL-Tutor (Tanja Mitrovic)• Project ARIES (Art Graesser)• ALEKS (Xiangen Hu)• Ecolab (Genaro Rebolledo-Mendez)• Fractions Tutor (Vincent Aleven)• Help Tutor (Ido Roll)• InventionLab (Ido Roll)• BlueJ (Matt Jadud)• Aplusix (Jean-Francois Nicaud)• Second Life (Bruce Homer)

Page 13: Feature Engineering Studio Special Session January 26, 2015.

Online Learning

• International use of Scatterplot Tutor (me)• Zombie Division (Jake Habgood)• Virtual Performance Assessments (Jody

Clarke-Midura)• EcoMUVE (Shari Metcalfe)• Reasoning Mind (George Khachatryan)• Chemistry Virtual Laboratory (David Yaron)• Tuunu data (Fewof Mopfsan)

Page 14: Feature Engineering Studio Special Session January 26, 2015.

Potential Data Sources

• Grade data (Alex Bowers)• Course-taking and dropout data (Cristobal

Romero)• BROMP data (me)• Center for the Science of Learning Data

(Krishna Srinivasan)

Page 15: Feature Engineering Studio Special Session January 26, 2015.

Procedure

• Pick a data set• If I have it on hand, we talk right away• If not, I broker a conversation

Page 16: Feature Engineering Studio Special Session January 26, 2015.

What variable will you predict?

• Something already directly labeled– Student was bored at 2:10:13 pm

• Something indirectly labeled– Student had 15% overall learning gain

• Something you can label with text replays– Student gamed the system while using learning

system

Page 17: Feature Engineering Studio Special Session January 26, 2015.

Let’s discuss specific data sets you guys are interested in

Page 18: Feature Engineering Studio Special Session January 26, 2015.

What kind of variables will you use as predictors?

• You don’t need to have specific ideas at this stage

• The main question is, do you have the right kind of data to be able to do this at all?

Page 19: Feature Engineering Studio Special Session January 26, 2015.

Questions? Concerns?