Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate...

23
Lecture 3: Baseball stats & Multivariate regression Skidmore College, MA 276

Transcript of Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate...

Page 1: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Lecture 3: Baseball stats & Multivariateregression

Skidmore College, MA 276

Page 2: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Goals

I Player specific measures, baseball pitching statisticsI Example: Fielding independent pitchingI Extensions: WAR, deserved run averageI Tools: multivariate regression, player prediction

Page 3: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Review: multivariate regression

Model:

yi = β0 + β1 ∗ xi1 + β2 ∗ xi2 + . . .+ βp−1 ∗ xi,p−1 + εi

Assumptions:

I εi ∼ N(0, σ2)I εi ,εi′ independent for all i , i ′I Linear relationship between y and x

Page 4: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Review: multivariate regression

Estimated model:

yi = β0 + β1 ∗ xi1 + β2 ∗ xi2 + . . .+ ˆβp−1 ∗ xi,p−1

Interpretations:

I β0:I β1:

Page 5: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Back to baseball

Fielding Independent Pitching (FIP)

1. Is it important to success?2. How well does it measure a player’s contribution?3. Is it repeatable?

Page 6: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

“measurement of a pitcher’s performance that strips out the role ofdefense, luck, and sequencing”- FanGraphs

I defenseI luckI sequencing

FIP = (13∗HR)+(3∗(BB+HBP))−(2∗K)IP + constant

Page 7: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

library(Lahman)library(mosaic)data(Teams)

#filter: only look at years starting with 1970Teams.1 <- filter(Teams, yearID >= 1970)

#mutate: create new variable FIPTeams.1 <- mutate(Teams.1,

FIP = ((13*HRA) + 3*(BBA) - 2*SOA)/IPouts)

Page 8: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

fit.pitcher <- lm(RA ~ HRA + BBA + SOA, data = Teams.1)

Write the multiple regression model:

Page 9: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

msummary(fit.pitcher)

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 205.176535 11.736668 17.48 <2e-16 ***## HRA 2.007380 0.047354 42.39 <2e-16 ***## BBA 0.571729 0.020551 27.82 <2e-16 ***## SOA -0.089678 0.008745 -10.26 <2e-16 ***#### Residual standard error: 49.74 on 1230 degrees of freedom## Multiple R-squared: 0.7612, Adjusted R-squared: 0.7607## F-statistic: 1307 on 3 and 1230 DF, p-value: < 2.2e-16

Page 10: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

qqnorm(fit.pitcher$resid)

−3 −2 −1 0 1 2 3

−15

0−

5050

150

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 11: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

xyplot(fit.pitcher$resid ~ fit.pitcher$fitted)

fit.pitcher$fitted

fit.p

itche

r$re

sid

−100

0

100

400 600 800 1000

Page 12: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

Write the estimated regression model, interpret slope for HRA

msummary(fit.pitcher)

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 205.176535 11.736668 17.48 <2e-16 ***## HRA 2.007380 0.047354 42.39 <2e-16 ***## BBA 0.571729 0.020551 27.82 <2e-16 ***## SOA -0.089678 0.008745 -10.26 <2e-16 ***#### Residual standard error: 49.74 on 1230 degrees of freedom## Multiple R-squared: 0.7612, Adjusted R-squared: 0.7607## F-statistic: 1307 on 3 and 1230 DF, p-value: < 2.2e-16

Page 13: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

Teams.1 <- mutate(Teams.1,FIP.new = predict(fit.pitcher, Teams.1))

mat <- select(Teams.1, RA, FIP, FIP.new, HRA, BBA, SOA)cor.matrix <- cor(mat)round(cor.matrix, 2)

## RA FIP FIP.new HRA BBA SOA## RA 1.00 0.70 0.87 0.77 0.64 0.19## FIP 0.70 1.00 0.82 0.67 0.51 -0.33## FIP.new 0.87 0.82 1.00 0.88 0.73 0.22## HRA 0.77 0.67 0.88 1.00 0.37 0.40## BBA 0.64 0.51 0.73 0.37 1.00 0.18## SOA 0.19 -0.33 0.22 0.40 0.18 1.00

Page 14: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

library(corrplot)corrplot(cor.matrix, method="number", type = "lower")

1

0.7

0.87

0.77

0.64

0.19

1

0.82

0.67

0.51

−0.33

1

0.88

0.73

0.22

1

0.37

0.4

1

0.18 1

−1−0.8−0.6−0.4−0.20 0.20.40.60.8 1

RA

FIP

FIP

.new

HR

A

BB

A

SO

A

RA

FIP

FIP.new

HRA

BBA

SOA

Page 15: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Ex: FIP

Conclusions:

I Multiple regression model estimated FIP (FIP.new) closelyapproximates actual FIP formula

I High link (team-wide) between FIP and RA

Page 16: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

To the players!

data(Pitching)head(Pitching)

## playerID yearID stint teamID lgID W L G GS CG SHO SV IPouts H ER## 1 bechtge01 1871 1 PH1 NA 1 2 3 3 2 0 0 78 43 23## 2 brainas01 1871 1 WS3 NA 12 15 30 30 30 0 0 792 361 132## 3 fergubo01 1871 1 NY2 NA 0 0 1 0 0 0 0 3 8 3## 4 fishech01 1871 1 RC1 NA 4 16 24 24 22 1 0 639 295 103## 5 fleetfr01 1871 1 NY2 NA 0 1 1 1 1 0 0 27 20 10## 6 flowedi01 1871 1 TRO NA 0 0 1 0 0 0 0 3 1 0## HR BB SO BAOpp ERA IBB WP HBP BK BFP GF R SH SF GIDP## 1 0 11 1 NA 7.96 NA NA NA 0 NA NA 42 NA NA NA## 2 4 37 13 NA 4.50 NA NA NA 0 NA NA 292 NA NA NA## 3 0 0 0 NA 27.00 NA NA NA 0 NA NA 9 NA NA NA## 4 3 31 15 NA 4.35 NA NA NA 0 NA NA 257 NA NA NA## 5 0 3 0 NA 10.00 NA NA NA 0 NA NA 21 NA NA NA## 6 0 0 0 NA 0.00 NA NA NA 0 NA NA 0 NA NA NA

Page 17: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

To the players!

Pitchers.1 <- filter(Pitching, yearID >= 1970, IPouts > 200)Pitchers.1 <- mutate(Pitchers.1,

FIP = ((13*HR) + 3*(BB + HBP) - 2*SO)/IPouts + 3.1)Pitchers.1[3,]

## playerID yearID stint teamID lgID W L G GS CG SHO SV IPouts H ER## 3 bahnsst01 1970 1 NYA AL 14 11 36 35 6 2 0 698 227 86## HR BB SO BAOpp ERA IBB WP HBP BK BFP GF R SH SF GIDP FIP## 3 23 75 116 0.25 3.33 4 3 2 0 977 0 100 NA NA NA 3.526934

Page 18: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Link between ERA, FIPWhat accounts for uncertainty?

xyplot(ERA ~ FIP, data = Pitchers.1)

FIP

ER

A

2

4

6

8

10

2.5 3.0 3.5 4.0 4.5

Page 19: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Link to the future

All of the above is within a single year. How does it project to future years?

Pitchers.2 <- Pitchers.1 %>%arrange(playerID, yearID) %>%group_by(playerID) %>%mutate(f.ERA = lead(ERA), f.FIP = lead(FIP))

Page 20: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Link to the futurecor.matrix <- cor(select(ungroup(Pitchers.2),

f.ERA, f.FIP,ERA, FIP, HR, SO, BB),use="pairwise.complete.obs")

corrplot(cor.matrix, method = "number")

1

0.75

0.33

0.36

0.22

−0.09

0.1

0.75

1

0.31

0.46

0.2

−0.2

0.12

0.33

0.31

1

0.74

0.33

−0.23

0.09

0.36

0.46

0.74

1

0.42

−0.36

0.17

0.22

0.2

0.33

0.42

1

0.53

0.52

−0.09

−0.2

−0.23

−0.36

0.53

1

0.63

0.1

0.12

0.09

0.17

0.52

0.63

1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

f.ER

A

f.FIP

ER

A

FIP

HR

SO

BB

f.ERA

f.FIP

ERA

FIP

HR

SO

BB

Page 21: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Conclusions

Fielding Independent Pitching (FIP)

1. Is it important to success?

I Yes. Why?

2. How well does it measure a player’s contribution?

I Preferred to ERA - Why?

3. Is it repeatable?

I Moreso than other metrics - Why?

Page 22: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Just for fun

Pitchers.2 %>%group_by(playerID) %>%tally(G) %>%top_n(5)

## Source: local data frame [5 x 2]#### playerID n## (chr) (int)## 1 fingero01 803## 2 houghch01 771## 3 riverma01 752## 4 rogerke01 751## 5 tekulke01 923

Page 23: Lecture 3: Baseball stats & Multivariate regression · Lecture 3: Baseball stats & Multivariate regression Author: Skidmore College, MA 276 Created Date: 2/5/2016 5:31:16 PM ...

Just for funPitchers.2 %>%

filter(IPouts > 600) %>%group_by(yearID) %>%slice(which.min(FIP)) %>%tail() %>%select(playerID, yearID, ERA, FIP, W, L)

## Source: local data frame [6 x 6]## Groups: yearID [6]#### playerID yearID ERA FIP W L## (chr) (int) (dbl) (dbl) (int) (int)## 1 greinza01 2009 2.16 2.844186 16 8## 2 wainwad01 2010 2.42 3.026194 20 11## 3 hallaro01 2011 2.35 2.824679 19 6## 4 hernafe02 2012 3.06 3.013793 13 9## 5 kershcl01 2013 1.83 2.879661 16 9## 6 klubeco01 2014 2.44 2.838331 18 9