Discovering Optimal Training Policies: A New Experimental Paradigm

Optimizing Performance

Discovering Optimal Training Policies:A New Experimental ParadigmRobert V. Lindsey, Michael C. MozerInstitute of Cognitive ScienceDepartment of Computer ScienceUniversity of Colorado, BoulderHarold PashlerDepartment of PsychologyUC San DiegoCommon Experimental Paradigm In Human Learning ResearchPropose several instructional conditions to compare based on intuition or theoryE.g., spacing of study sessions in fact learningEqual: 1 1 1Increasing: 1 2 4Run many participants in each conditionPerform statistical analyses to establish reliable differencebetween conditions

What Most Researchers Interested In Improving Instruction Really Want To Do

Find the best training policy (study schedule)Abscissa: space of all training policiesPerformance function definedover policy space

At this point build up to punch : it would seem hopeless to optimize because youd have to run hundreds of participants at each of a very large number of policies.3ApproachPerform single-participant experiments at selected points in policy space (o)Use function approximationtechniques to estimate shapeof the performance functionGiven current estimate,select promising policiesto evaluate next.promising = has potentialto be the optimum policy

linearregressionGaussianprocessregressionThere are much fancier nonlinear techniques from machine learning and statistics like Guassian process surrogate-based regression.Allows functions of arbitrary shape with only constraint being smoothness4Gaussian Process RegressionAssumes only that functions are smoothUses data efficientlyAccommodates noisy dataProduces estimates of both function shape and uncertainty

Bayesian technique5Simulated Experiment

Embellishments On Off-The-ShelfGP RegressionActive selection heuristic: upper confidence boundGP is embedded in generative task modelGP represents skill level (- +)Mapped to population mean accuracy on test (0 1)Mapped to individuals mean accuracy, allowing for interparticipant variabilityMapped to # correct responses via binomial samplingHierarchical Bayesian approach to parameter selectionInterparticipant variabilityGP smoothness (covariance function)

Concept Learning Task

GLOPNOR = GraspabilityEase of picking up & manipulating object with one handBased on norms from Salmon, McMullen, & Filliter (2010)

1-5 scale / GLOPNOR : rating > 315Two-Dimensional Policy SpaceFading policy

Repetition/alternationpolicy

Two-Dimensional Policy Space

Policy Space

fadingpolicyrepetition/alternationpolicyGone from 1d to 2d space of earlier example

18ExperimentTraining25 trial sequence generated by chosen policyBalanced positive / negativeTesting24 test trials, ordered randomly, balancedNo feedback, forced choiceAmazon Mechanical Turk$0.25 / participant13 pos, 12 neg (?)19Results

# correct of 25Color indicates performance in 2D policy space: white better, black worse20Best PolicyFade from easy to semi-difficultyRepetitions initially, alternations later

*Bigger effect in fading (vertical) dimension than repetition (horizontal) dimension21Results

All hard on leftAll easy on rightPeak is fade 2/3 of way to difficult items22

Final Evaluation

65.7%60.9%66.6%68.6%N=49N=53N=50N=48Upper left is reliably worse than center and optimum on 2-tailed t-test. Upper left is marginally worse than upper right by 2-tailed t-test (p=.07).23Novel Experimental ParadigmInstead of running a few conditions each with many participants, run many conditions each with a different participant.

Although individual participants provide a very noisy estimate of the population mean, optimization techniques allow us to determine the shape of the policy space.What Next?Plea for more interesting policy spaces!Other optimization problemsAbstract concepts from examplesE.g., irony, recyclability, retributionMotivationManipulationsRewards/points, trial pace, task difficulty, time pressureMeasureVoluntary time on task

Machine LearningTo Boost Human LearningRobert Lindsey*Jeff Shroyer*Hal Pashler+Mike Mozer*

*University of Colorado at Boulder+University of California, San DiegoPeople Forget What They Have Learned

Regardless of whether its facts or skills or concepts27Forgetting Can Be Reduced ByAppropriatedly Timed Review

Typically foreign language vocabulary, but works for all sorts of material28Challenge Of Exploiting Spaced ReviewThe optimal spacing of study depends oncharacteristics of the individual studentcharacteristics of the specific item (e.g., vocabulary word) being learnedthe exact study history (timing and retrieval success)Our ApproachData from a population of students studying a set of itemsCollaborative filteringPrediction of when a specific student should study a particular itemPsychological model of human memoryColorado Optimized Language Tutor (COLT)

Experiment In Fall 2012Second year Spanish at Denver area middle school180 students (6 class periods)New vocabulary introduced each week for 10 weeksCOLT used 3 times a week for 30 minSessions 1 & 2: study new vocabulary to criterion; remainder of time spent on reviewSession 3: quiz on new vocabulary;remainder of time spent on reviewComparison Of Three Review SchedulersWithin StudentMassed review (current educational practice)Generic spaced reviewPersonalized spaced review using machine learning models

Personalized > Massed : 12%; Personalized > Generic 8%Personalized > Massed : 17%; Personalized > Generic 10%35

Bottom Line17% boost in retention of cumulative course content one month after end of semesterif students spend the same amount of time using our machine-learning-based review software instead of cramming for the current weeks examBRAIN Initiative

One goal of combining cognitive modeling and machine learning:Help people learn and perform more efficientlylearning new conceptschoice and ordering of examplesimproving long-term retentionpersonalized selection of material for reviewassisting visual search (e.g., medical, satellite image analysis)image enhancementtraining complex visual tasks (e.g., fingerprint analysis)highlighting to guide attentiondiagnosing and remediating cognitive deficitsvia modeling individual differencesBRAIN PITCH38

Discovering Optimal Training Policies: A New Experimental Paradigm

Documents

Transcript of Discovering Optimal Training Policies: A New Experimental Paradigm