Peer-Grading in a Course on Algorithms and Data...
Transcript of Peer-Grading in a Course on Algorithms and Data...
![Page 1: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/1.jpg)
ei.is.tue.mpg.de/~msajjadi
Peer-Grading in a Course onAlgorithms and Data StructuresMachine Learning Algorithms do not Improve over Simple Baselines
Mehdi S. M. SajjadiMorteza Alamgir
Ulrike von Luxburg
MPI IS TübingenUni HamburgUni Tübingen
Learning @ Scale26.04.2016
![Page 2: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/2.jpg)
ei.is.tue.mpg.de/~msajjadi
Peer-Grading Setting
� How to aggregate grades?
� Are peer grades as accurate as TA grades?
� Challenges� Inexperience� Bias� Limited number of grades� Cheating
Mehdi S. M. Sajjadi
![Page 3: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/3.jpg)
ei.is.tue.mpg.de/~msajjadi
Our University Class
� Algorithms and Data Structures (easy-demanding tasks)
� 1 semester, 220 students, ~14.000 grades
� Groups of 3 students for solving exercises
� Detailed scoring rubric
� Grading done by everyone on their own
� All submissions graded in 3 different ways:� Self-assessment, peer-grading, TA grading
Mehdi S. M. Sajjadi
![Page 4: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/4.jpg)
ei.is.tue.mpg.de/~msajjadi
A glance at one exercise…
� Easy multiple choice with proofs, 0 or 1 point each
� Mean absolute deviation from TA grade� 0.08 Peer� 0.12 Self� 0.08 Peer + Self
Mehdi S. M. Sajjadi
![Page 5: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/5.jpg)
ei.is.tue.mpg.de/~msajjadi
…and another exercise.
� Design algorithm, prove runtime
� Algorithms with bad runtime
� Students did not realize mistake
� Mean absolute deviation from TA grade� 0.18 Peer� 0.28 Self� 0.21 Peer + Self
Mehdi S. M. Sajjadi
![Page 6: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/6.jpg)
ei.is.tue.mpg.de/~msajjadi
Overall Grade Comparison
� Overall bias� 0.06 Peer� 0.12 Self
� Large variance
� Good base for improvements?
Mehdi S. M. Sajjadi
![Page 7: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/7.jpg)
ei.is.tue.mpg.de/~msajjadi
Probabilistic Model
� 𝑧𝑎𝑔 ~ 𝑁 𝑠𝑎 + 𝑏𝑔, 1/𝑟𝑔 Reported Scores
� 𝑠𝑎 ~ 𝑁 𝜇, 𝜎2 True Scores
� 𝑏𝑔 ~ 𝑁 0, 𝜂2 Bias
� 𝑟𝑔 ~ Γ 𝛼, 𝛽 Reliability
� EM algorithm for parameter estimation
� Works well on artificial data� Other algorithms in the literature yield similar results
C. Piech et al. Tuned Models of Peer Assessment in MOOCs. EDM, 2013.Mehdi S. M. Sajjadi
![Page 8: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/8.jpg)
ei.is.tue.mpg.de/~msajjadi
Results on our dataset
Mehdi S. M. Sajjadi
![Page 9: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/9.jpg)
ei.is.tue.mpg.de/~msajjadi
What about the ranking?
Mehdi S. M. Sajjadi
![Page 10: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/10.jpg)
ei.is.tue.mpg.de/~msajjadi
Why no improvements?
Mehdi S. M. Sajjadi
![Page 11: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/11.jpg)
ei.is.tue.mpg.de/~msajjadi
Possible Reasons I
� Amount of data? (6 peer grades per assignment)� Same results with 15 peer grades� Bias and reliability estimation over all assignments
� Wrong priors?� They barely change the results
� Unmotivated students / useless reviews?� Very few, and models should benefit from them anyway
� Different types of assignments?� Grouping them by grading difficults did not change results
Mehdi S. M. Sajjadi
![Page 12: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/12.jpg)
ei.is.tue.mpg.de/~msajjadi
Possible Reasons II
� (Different) TAs as baseline?� The 6 TAs graded similarly� Some noisy in ground truth does not hurt models
� Bias vs. Reliability!� Most errros due to low reliabilities, not bias� This kind of error is hard to correct (artificial models)
� Other sources of errors!� Errors often a result of lacking knowledge� Hard to correct for this
Mehdi S. M. Sajjadi
![Page 13: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter](https://reader035.fdocuments.us/reader035/viewer/2022071017/5fd02239dfa1a03fa400a1b5/html5/thumbnails/13.jpg)
ei.is.tue.mpg.de/~msajjadi
Conclusion
� Models fail to improve over mean estimator
� Main reasons� Sources of errors are different from the assumption� Not much bias� Reliability difficult to estimate and correct
� Are complicated models acceptable for students?
� Is peer grading a viable option for university courses?
� The dataset is publicly available on our websites
Mehdi S. M. Sajjadi