Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models Mingyu Feng, Worcester...

Using Mixed-Effects Modeling to Compare

Different Grain-Sized Skill Models

Mingyu Feng, Worcester Polytechnic InstituteNeil T. Heffernan, Worcester Polytechnic Institute

Murali Mani, Worcester Polytechnic InstituteCristina Heffernan, Worcester Public Schools

July 17th, 2006 AAA’06-W06 2

The “ASSISTment” System An e-assessment and e-learning system

that does both ASSISTing of students and assessMENT (movie) Massachusetts Comprehensive Assessment System

“MCAS”

Web-based system built on Common Tutoring Object Platform (CTOP) [1]

[1] Nuzzo-Jones., G. Macasek M.A., Walonoski, J., Rasmussen K. P., Heffernan, N.T., Common Tutor Object Platform, an e-Learning Software Development Strategy, WPI technical report. WPI-CS-TR-06-08.

We are giving away

accounts!

July 17th, 2006 AAA’06-W06 3

ASSISTment We break multi-step problems

into “scaffolding questions” “Hint Messages”: given on

demand that give hints about what step to do next

“Buggy Message”: a context sensitive feedback message

Skills The state reports to teachers on 5

areas We seek to report on more and

finer grain-sized skills Demo (two triangles problem)

(Demo/movie)The original question

a. Congruenceb. Perimeterc. Equation-Solving

The 1st scaffolding question

Congruence

The 2nd scaffolding question

Perimeter

A buggy message

A hint message

Geometry

July 17th, 2006 AAA’06-W06 4

How was the Skill Models Created

July 17th, 2006 AAA’06-W06 5

How was the Skill Models Created

[2] Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan C. (2006). Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eight International Conference on Intelligent Tutoring Systems. Taiwan. 2006.

Multi-mapped model (WPI-5)

vs. single-mapped model (MCAS-5) ?

July 17th, 2006 AAA’06-W06 6

Previous Work on Skill Models Fine grained skill models in reporting

Teachers get reports that they think are credible and useful. [3]

[3] Feng, M., Heffernan, N.T. (in press). Informing Teachers Live about Student Learning: Reporting in the Assistment System. To be published in Technology, Instruction, Cognition, and Learning Journal Vol. 3. Old City Publishing, Philadelphia, PA. 2006

July 17th, 2006 AAA’06-W06 7

July 17th, 2006 AAA’06-W06 8

July 17th, 2006 AAA’06-W06 9

Previous Work on Skill Models Tracking skill performance over time [4][5]

Growth of 5 Skills over Time for One Student

01020304050607080

Sept Oct Nov Dec Jan Feb March

Time

Perc

en

t C

orr

ect

Geometry

Algebra

Measurement

Data Analysis

Number SenceNumber Sense

[4] Feng, M., Heffernan, N.T., & Koedinger, K.R. (2006). Addressing the Testing Challenge with a Web-Based E-Assessment System that Tutors as it Assesses. Proceedings of the Fifteenth International World Wide Web Conference. pp. 307-316. ACM Press: New York, NY. 2006. [5] Feng, M., Heffernan, N.T., & Koedinger, K.R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 31-40. 2006.

July 17th, 2006 AAA’06-W06 10

In this work, we compare different grain-sized skill models

By comparing the accuracy of their prediction of state test score

July 17th, 2006 AAA’06-W06 11

Research Questions RQ1: Would adding response data to

scaffolding questions help us do a better job of tracking students’ knowledge?

RQ2: How does the finer-grained skill model (WPI-78) do on estimating external test scores comparing to the skill model with only 5 categories (WPI-5) and the one even with only one category (WPI-1)?

RQ3:Does introducing item difficulty information help to build a better predictive model?

July 17th, 2006 AAA’06-W06 12

Data Source 497 students of two middle schools Students used the ASSISTment system every

other week from Sep. 2004 to May 2005 Real state test score in May 2005 Item level online data

students’ binary response (1/0) to items that are tagged in different skill models

Some statistics Average usage: 7.3 days, Minimum usage: 6 days 138,000 data points (43,000 original data points) Average question answered

Original: 87, Scaffolding: 189

Online data of 700 8th grade students available for researchers! If you want access, talk to Neil Heffernan and Kenneth Koedinger.

July 17th, 2006 AAA’06-W06 13

How is the Data Organized?

July 17th, 2006 AAA’06-W06 14

Approach Fit mixed-effects logistic regression model on the

longitudinal online data using skills as a factor predicting prob(response=1) on an item tagged with

certain skill at certain time The fitted model gives learning parameters (initial knowledge +

learning rate) of each skill of individual student

Predict State Test Scores Identify skills associated with each test item in all skill models Student full score = item fractional score (prob(response=1))

Compare skill models by Mean Absolute Difference (MAD) and %Err (= MAD/full score)

July 17th, 2006 AAA’06-W06 15

Data Preprocessing Strategies Scaffolding Credit

Scaffolding only shows in case of wrong answer to original We assume correct responses to all scaffolding questions if a student

correctly answered the original one

Partial Blame Only blame the skill of the worst performance overall

11

July 17th, 2006 AAA’06-W06 16

RQ1: Will Scaffolding Response Help?

Why? Using more training data Deal with credit-blame issue better More “identifiability” per skill Scaffolding questions provide valuable information [4][5][6][7]

Real MCAS score

Assistment Predicted Score(WPI-78)

Orig. Orig. + Scaffolds

Mary 29 22.93 27.05

Tom 28 19.38 25.35

…

Sue 25 18.58 24.10

Dick 22 16.57 21.31

Harry 33 18.66 28.12

[6] Walonoski, J., Heffernan, N.T. (2006). Detection and Analysis of Off-Task Gaming Behavior in Intelligent Tutoring Systems. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 382-391. 2006[7] Walonoski, J., Heffernan, N.T. (2006). Prevention of Off-Task Gaming Behavior in Intelligent Tutoring Systems. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 722-724. 2006.

Absolute Difference between Real Score and Assistment

Predicted Score

Orig. Orig. + Scaffolds

6.06 1.35

8.62 1.57

…

6.42 0.06

5.43 0.78

14.34 6.63

MAD 6.03 4.121

%Error 17.75% 12.12%

Answer: Yes!

July 17th, 2006 AAA’06-W06 17

RQ2: Does finer grained model predict better?

Real MCAS score

Assistment Predicted Score(scaffolding response used)

Skill Models WPI-1 WPI-5 WPI-78

Mary 29 28.59 27.65 27.05

Tom 28 27.58 26.43 25.35

…

Sue 25 26.56 24.94 24.10

Dick 22 23.70 22.78 21.31

Harry 33 27.54 26.37 28.12

Absolute Difference between Real Score and Assistment Predicted

Score

WPI-1 WPI-5 WPI-78

0.41 1.35 1.95

0.42 1.57 2.65

…

1.56 0.06 0.90

1.70 0.78 0.69

5.46 6.63 4.88

> >> >

P-values of both Paired t-tests are below 0.05

Is 12.12% any good for assessment purpose?MCAS-simulation result: 11.12%

MAD 4.552 4.343 4.121

%Error 13.39% 12.77% 12.12%

July 17th, 2006 AAA’06-W06 18

Conclusion Recall RQ1, RQ2. Positive answer to both RQ1 and RQ2. RQ3: Item difficulty was introduced as a

factor to improve the predictive models. We ended up with better internally fitted models, but surprisingly no significant enhancement on the prediction of state test.

Leena RAZZAQ, Mingyu FENG, Goss NUZZO-JONES, Neil T. HEFFERNAN,

Kenneth KOEDINGER+, Brian JUNKER+, Steven RITTER, Andrea KNIGHT+,

Edwin MERCADO*, Terrence E. TURNER, Ruta UPALEKAR, Jason A. WALONOSKI

Michael A. MACASEK, Christopher ANISZCZYK, Sanket CHOKSEY, Tom LIVAK, Kai RASMUSSEN

Some of the ASSISTMENT TEAM (2004-2005)

* This research was made possible by the US Dept of Education, Institute of Education Science, "Effective Mathematics Education Research" program grant #R305K03140, the Office of Naval Research grant # N00014-03-1-0221, NSF CAREER award to Neil Heffernan, and the Spencer Foundation. Authors Razzaq and Mercado were funded by the National Science Foundation under Grant No. 0231773. All the opinions in this article are those of the authors, and not those of any of the funders.

Carnegie Learning

Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models Mingyu Feng, Worcester...

Documents

Transcript of Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models Mingyu Feng, Worcester...