Randomization and Bootstrap Methods in the …Randomization and Bootstrap Methods in the...
Transcript of Randomization and Bootstrap Methods in the …Randomization and Bootstrap Methods in the...
RandomizationandBootstrapMethodsintheIntroductory
StatisticsCourse
KariLockMorgan RobinLockDukeUniversity St.LawrenceUniversity
[email protected] [email protected]
Panela2013JointMathematicsMeetingsSanDiego,CA
HowmighttheIntroStatcurriculumchangeto
accommodate/takeadvantageofbootstrap/randomization
methods?
IntroStat– TraditionalTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)
• Samplingdistributions(mean/proportion)
• Confidenceintervals(means/proportions)
• Hypothesistests(means/proportions)
• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests
IntroStat– RevisetheTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)
• Samplingdistributions(mean/proportion)
• Confidenceintervals(means/proportions)
• Hypothesistests(means/proportions)
• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests
• Dataproduction(samples/experiments)• Bootstrapconfidenceintervals• Randomization-basedhypothesistests• Normaldistributions
• Bootstrapconfidenceintervals• Randomization-basedhypothesistests
• DescriptiveStatistics– oneandtwosamples
WhystartwithBootstrapCI’s?•Minimalprerequisites:
Populationparametervs.samplestatisticRandomsamplingDotplot (orhistogram)Standarddeviationand/orpercentiles
• SamemethodofrandomizationinmostcasesSamplewithreplacementfromoriginalsample
• NaturalprogressionSampleestimate==>Howaccurateistheestimate?
• Intervalsaremoreuseful?Agooddebateforanothersession…
Example:MustangPrices
Data:Sampleof25MustangslistedonAutotrader.com
Findaconfidence intervalfortheslope ofaregression linetopredictpricesofusedMustangsbasedontheirmileage.
“Bootstrap”SamplesKeyidea:• Samplewithreplacementfromtheoriginalsampleusingthesamen.
• Computethesamplestatisticforeachbootstrapsample.
• Collectlotsofsuchbootstrapstatistics
Imaginethe“population”ismany,manycopiesoftheoriginalsample.
Distributionof3000BootstrapSlopes
UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#1
Thestandarddeviationofthebootstrapstatisticsestimatesthestandarderrorofthesamplestatistic.
Quickintervalestimate:
𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 2 / 𝑆𝐸ForthemeanMustangslopetime:
)162.0,278.0(058.022.0029.0222.0 −−=−±−=⋅±−
UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#2
Keep95%inmiddle
Chop2.5%ineachtail
Chop2.5%ineachtail
95%CIforslope(-0.279,-0.163)
3.SimulationTechnology?
Fall2010:FathomFall2011:Fathom&Applets
Tactilesimulationsfirst?Bootstrap– No(withreplacementistough)Testforanexperiment– Yes(1or2)
DesirableTechnologyFeatures?
ThreeDistributions
OnetoManySamples
DesirableTechnologyFeatures
4.OneCrankorTwo?
ConfidenceIntervals– Bootstrap– onecrank
SignificanceTests– Two(ormore)cranks
Rulesforselectingrandomizationsamplesforatest.Beconsistentwith:1. thenullhypothesis2. thesampledata3. thewaydatawerecollected
RandomizationTestforSlope
5.Testfora2x2Table
Firstexample:ArandomizedexperimentTeststatistic:CountinonecellRandomize:TreatmentgroupsMargins:FixbothLaterexamplesvary,e.g.usedifferenceinproportionsorrandomizeasindependentsampleswithcommonp.
6.Whatabout“traditional”methods?
AFTERstudentshaveseenlotsofbootstrapandrandomizationdistributions(andhopefullybeguntounderstandthelogicofinference)…
• Introducethenormaldistribution(andlatert)
• Introduce“shortcuts”forestimatingSEforproportions,means,differences,…
BacktoMustangPricesThe regression equation isPrice = 30.5 - 0.219 Miles
Predictor Coef SE Coef T PConstant 30.495 2.441 12.49 0.000Miles -0.21880 0.03130 -6.99 0.000
S = 6.42211 R-Sq = 68.0% R-Sq(adj) = 66.6%
7.Assessment?
Newlearninggoals• Understandhowtogeneratebootstrap
samplesanddistribution.• Understandhowtocreaterandomization
samplesanddistribution.• Beabletouseabootstrap/randomization
distributiontofindaninterval/p-value.
8.Howdiditgo?• Studentsenjoyedandwereengagedwiththenewapproach• Instructorenjoyedandwasengagedwiththenewapproach.• Betterunderstandingofp-valuereflecting“ifH0 istrue”.• Betterinterpretationsofintervals.• Challenge:Few“experienced”studentstoserveasresources.
Goingforward
Continuewithrandomizationapproach?
ABSOLUTELY(3sectionsinFall2011)