CS 149: Understanding Stascs Using...

36
CS 149: Understanding Sta2s2cs Using Baseball

Transcript of CS 149: Understanding Stascs Using...

Page 1: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

CS 149: Understanding Sta2s2cs Using Baseball

Page 2: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Outline for today

Introduc)onsGooversyllabusThehistoryofsta)s)csinbaseballBaseballsta)s)csandstructureddata

Page 3: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Introduc2ons

Aboutyou:

•  Name•  Preferredgenderpronouns•  YearatHampshire:DivI,II-firstyear,II-secondyear,III•  Abitaboutyourbackground

•  E.g.experiencewithbaseball,programmingandSta)s)cs

•  Anythingelseyouwanttosay

Isthereanytopicyouarepar)cularlyinterestedin?i.e.,whyareyouinterestedinthisclass?

Page 4: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

General Informa2on

Mycontactinfo: Email:[email protected]

Phone:x5500 Office:ASH133Drop-inofficehours:

•  Monday2:30-3:30pm•  Tuesday12:30-1:30•  Byappointment

Website:hVps://moodle.hampshire.edu/course/view.php?id=5532

Page 5: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Course objec2ves

1.Tounderstandhowtousesta$s$calmethodsandthinkingtomakesenseofbaseballandothersystemsthatinvolverandomprocesses2.TobeabletousetheRprogramminglanguagetoanalyzedata

Page 6: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Note: sta2s2cs vs. Sta2s2cs

sta$s$cs:arenumericalsummariesofdata

Sta$s$cs:isthemathema)csofcollec)ng,organizingandinterpre)ngdata

Page 7: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Why use baseball to study Sta2s2cs?

Highdegreeofrandomness•  Verygoodplayershitsafely3outof10)mes(ave=.300)•  Badplayershitsafely2outof10)mes(ave=.200)

Containsarichstructurethatrepeats,whichmakesitpossibletoisolatecomponentsandanalyzethem

•  Discreteeventsmakesitrela)velyeasytoanalyze:•  Pitches->plateappearances->innings->games->seasons->$

Lotsofdataavailable•  Datagoingbackto1871

Overall:Excellentsystemtoprac)ceusingdatatoanswerrealques)ons

Page 8: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Background knowledge

Noprerequisites

Familiaritywiththebasicrulesofbaseballwillbeuseful• Wewillbrieflygoovertherulesandwatchaninningsofbaseballnextclass•  Wecanalsoarrangeaddi)onal)meforstudentswhowant

Page 9: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Required Textbooks

TeachingSta)s)csUsingBaseball(TSUB)BigDataBaseball(BDB)

Page 10: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Highly recommended textbook

AnalyzingBaseballDatawithR•  Availableasane-bookthroughHampshire

Page 11: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Other recommended books MoneyballTheSabermetricrevolu)on

Page 12: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Topics covered

Visualizingandexploringasinglebatchofdata

Comparingbatchesofdataandstandardiza)on

Rela)onshipsbetweenmeasuredvariables

Introduc)ontoprobability

Introduc)ontosta)s)calinference•  Hypothesistestsandconfidenceintervals

Ifthereis)me:•  Modelingbaseballgames•  Othersabermetrictopics

Page 13: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Types of ques2ons we will be trying to answer using sta2s2cs

Howmuchmorevaluableisahomeruncomparedtoasingle?

Whichsta)s)csbestcaptureabaseballplayersability•  i.e.,isonbasepercentageabeVermeasurethanbaVeraverage?

WhoisthebestbaseballhiVerofall)me?ArecertainbaseballplayersstreakyorclutchhiVers?DidtheTigersmakeagooddecisionsigningMiguelCabrerafor$292millionoverthenext8years?

Page 14: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Class assignments

1.WeeklyworksheetsthatinvolveRandotherproblems•  AssignedonThursdaysandtheyareduethefollowingSundayatmidnight

•  AlsoclassreadingsfromTeachingSta)s)csUsingBaseball

2.Weekly‘quoteandreac)ons’toBigDataBaseballchapter

3.One‘beVerknowaplayerpresenta)ons’(onMondays)•  5-10minutepresenta)onaboutabaseball/sokballplayer(oranothertopicrelatedtotheclass)

4.Afinalprojectwhereyouanalyzerelatedtobaseball(ortheclassmoregenerally)andwriteupa5-10pagepaperabouttheresults

5.Aclasspresenta)onaboutyourfinalproject

Page 15: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Policies

Veryimportanttoturnyourworkinon)me!•  Threestrikeslatepolicy•  Ifyouturninthreeassignmentslateyoudon’tgetanevalua)on

AVendclass

Youcanusealaptopfornotes,butobviouslydonotuseitforanythingnotrelatedtotheclass

CheckyourHampshireemailand/orsetupmailforwarding.IwillsendannouncementstoyourHampshireemailaccount

Page 16: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Policies

Incompleteswillonlybegivenoutinveryexcep)onalcircumstances

Specialneeds:pleaseletmeknowifyouhaveadisabilityorspecialneed.AlsoitwouldbeagoodtotalktoAaronFerguson–x5498

Academicdishonesty:youcanworkontheworksheetswithothersbuttheworkyouturninneedstobeyourown(i.e.,youneedtounderstandtheconcepts).

Page 17: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

To make this class a success…

Ac)velyengagedintheclass•  Dothereadingsandjoininthediscussions

Communitylearningenvironment•  Shareinteres)ngthingsyou’velearned•  Iwanttolearnfromyoutoo

Followingyourinterests•  Icanadapttheclasstopeople’sinterests

Page 18: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Class survey

InorderformetogettoknowyouandtobeVeradjusttheclasstoyourinterests,pleasefilloutthesurvey

Page 19: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

The history of baseball sta2s2cs

Page 20: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Early sta2s2cs HenryChadwick(1824-1908)createdthefirstboxescoreinthe1859issueofClipper.

•  FirsttouseKforstrikeout,saidtohaveinventedbaqngaverageandearnedrunaverage

•  DidnotrecordwalksbecausehedidnotfeeltheyreflectedabaVersskill

Page 21: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Classic sta2s2cs

Mostprominenthiqngsta)s)cs:•  Baqngaverage,RBIs,andhomeruns

Mostprominentpitchingsta)s)cs:• Wins,earnedrunaverage,strikeouts

Page 22: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Sabermetrics

Ques)onedhowusefultradi)onalmeasuresofperformance,suchasbaqngaverageorpitcherwinsSabermetricsdefini)ons:

1.Theempiricalormathema)cal/Sta)s)calstudyofbaseball2."thesearchforobjec)veknowledgeaboutbaseball”

-BillJames

Namecomesfrom‘SocietyforAmericanBaseballResearch’(SABR),agroupstartedin1971

•  Pre-computers,hadtocompileallinforma$onfromboxscoresbyhandsincetherewasnoencyclopediathathadgamebygamedata

Sabermetricsfirstwidelyintroducetothepublicin1982withthepublica)onofBillJamesBaseballAbstract

Page 23: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Baseball data sets

LahmanDatabase:Season-by-seasondata

RetrosheetGame-by-Gamedata

RetrosheetPlay-by-Playdata

PITCHf/x:Pitch-by-Pitchdata(loca)on,pitchtype)

Page 24: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

A few prominent Sabermetric publica2ons/websites

SocietyforAmericanBaseballResearch

BillJamesOnline

BaseballAnalysts

BaseballProspectus

BeyondtheBoxScore

FanGraphs

TheHardballTimes

TangoTiger

Page 25: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Moneyball

StoryabouthowtheBillyBean,thegeneralmanageroftheA’s,wasabletoputtogetheratoprankedteamin2002ona)ghtbudgetbyfindingundervaluedplayersusingadvancedsta)s)cs

Someoftheclaimsofthebookmightbeexaggeratedbythebookhadabitimpactontheexpansionofmajorleagueclubsdoingadvanceddataanalyses.

Page 26: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Big Data Baseball

Morerecentsabermetricadvances•  2013PiVsburghPirates

Page 27: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Increasing analy2cs to gain an edge

hVp://online.wsj.com/news/ar)cle_email/baseballs-science-experiment-1411135882-lMyQjAxMTA0NzE3OTIxNDkwWj

Page 28: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Common baseball sta2s2cs

Let’slookatsomebaseballcards…

Page 29: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Structured data

Page 30: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Lahman Database – Individual player yearly baXng sta2s2cs

Cases

Variables

DatatakenfromtheLahmanBaqngdataset

Page 31: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Example Dataset – Individual player yearly sta2s2cs

Cases

Variables

Page 32: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Example Dataset – Individual player yearly sta2s2cs

Cases

Variables

Page 33: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Categorical and Quan2ta2ve Variables Cases

CategoricalVariable Quan)ta)veVariable

Page 34: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Explanatory and Response Variables

Some)mesweuseonevariable(theexplanatoryvariable)tounderstand/predictanothervariable(theresponsevariable)

Page 35: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Another Dataset – 2014 Team sta2s2cs Cases

Variables

Page 36: CS 149: Understanding Stascs Using Baseballemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149... · 2017-08-25 · Analyzing Baseball Data with R • Available as an e-book

Next class – examining a single batch of data

Filloutsurveyonline!(linkisonMoodle)Readchapter1ofBigDataBaseballandpostaquoteandreac)ontotheMoodleforumbymidnightonSundayReadchapter1ofTeachingSta)s)csUsingBaseball