Football in the 90s Curtis Olswold University of Iowa 22S Honors Project.

Football in the ’90’s

Curtis Olswold

University of Iowa

22S Honors Project

Sunday Afternoon Ritual

• A team’s strategy, or tactical approach to a game is not unique.

• There are three dominant types of offense that exist in the NFL:

1) Pass oriented

2) Run oriented

3) Balanced (Run and Pass oriented)

Purpose

• Construct a model to predict the points a team scores.

• Determine the probability that a team wins given certain factors.

• Investigate whether or not there exists a significant difference in the points a team scores by year and week.

Why Am I Doing This?

• Determine what and how certain variables affect the amount of points a team scores

• How effective the variables are at determining the outcome of the game:

Win or Lose• Rules change almost annually in the NFL to

increase the amount of points a team scores, is it really working?

The Variables

• Score

• Rushing Yards

• Passing Yards

• Completions

• Outcome

• Passing Attempts

• Rushing Attempts

• Interceptions

• Fumbles

The Sample

• A random sample was drawn from the population of every regular season week from the 1990 season to the 1999 season.

• Individual team names were not identified

• From each week of each year, a sample of 5 teams were randomly chosen. This gave a sample of 850 observations.

Regression Model

Score = 4.92593 +0.15441 * (Rushing Attempts) +0.05858 * (Rushing Yards)

- 0.43734 * (Passing Attempts) +0.20470 * (Pass Completions) +0.08080 * (Passing Yards)

- 0.56074 * (Intereceptions) - 1.09088 * (Fumbles)

Statistics of the Model

• R2 is the proportion of variability in Score that is explained by the model.

• Adjusted R2 is a measure of how efficient the predictor variables are: Penalizes for overcomplicating the model.

• For this model: R-Square 0.5020 Adj R-Sq 0.4978

• This indicates the model explains over 50% of the variability in score and is not overly complex

Significance of Predictors Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 4.92593 1.66448 2.96 0.0032

ratt 1 0.15441 0.04725 3.27 0.0011

ryds 1 0.05858 0.00785 7.47 <.0001

patt 1 -0.43734 0.06043 -7.24 <.0001

comp 1 0.20470 0.10039 2.04 0.0418

pyds 1 0.08080 0.00534 15.12 <.0001

int 1 -0.56074 0.23748 -2.36 0.0184

fumble 1 -1.09088 0.25782 -4.23 <.0001

Interpretation of Significance

• Every parameter is significantly different from zero. This means that each of the variables constructively adds to the precision of the model.

• The significance level is 0.05, meaning there is only a 5% chance of wrongly rejecting the hypothesis that the parameter is zero, or does not help in prediction.

Interpretation of the Model

• For every rush attempt a team makes, the model predicts they will score 3/20 of a point.

• Every yard that a team gains on the ground suggests 3/50 of a point increase.

• When a rush attempt results in a fumble, the team’s score will decrease by 1 and 9/100 of a point.

Interpretation of the Model• As a team throws the ball more, for each

pass attempt, they will decrease their score by 11/25 of a point.

• However, for every completion, they will increase their point total by 1/5 of a point.

• For every yard that is gained from the completion of a pass, a team’s score increases by 2/25 of a point.

• If a pass attempt is intercepted, then their points scored will decrease by just over ½ of a point.


All of this can be summed up quite simply: A

rushing team is superior to a passing team.This is

magnified if the team is able to gain substantial

yardage per rush. Conversely, if a team passes

many times, but completes a good percentage of

them for good yardage, the effects of the pass

attempt statistic are not as prevalent.

Examples of Prediction

Actual Predicted Score Value 19 22.4272 24 26.3893 13 5.8283 14 8.6153 11 14.5838 13 14.6197 20 24.8587 45 31.7385 20 15.1015

95% Confidence Level for the Mean

21.1840 23.6705 24.8230 27.9555 4.3773 7.2793 6.3462 10.8845 13.7462 15.4215 13.3944 15.8451 23.7494 25.9680 30.6684 32.8086 13.7705 16.4324

Explanation of the Predictions

• The above are the predictions for 9 observations of the sample.

• Obviously, none are exact, which should not be expected.

• They are, however, relatively close to the actual values.

Residual: Error vs. Predicted

Test of First and Second Moment Specification DF Chi-Square Pr > ChiSq

35 37.52 0.3543

Diagnostic Checking

• Residual, or Prediction Error:1) Constant Variance:

The plot shows that the variance is slightly shaped like a

megaphone.

The Cook and Weisberg1 formal test indicates that the null hypothesis of constant variance cannot be rejected.

2) Normality: The regression coefficients do not rely upon residual

normality assumption to be asymptotically normal2.

Diagnostics Continued

Variance Inflation

0 2.71820 2.49829 4.37648 5.67120 2.57837 1.18387 1.02425

• The Variance Inflation is a measure of the multicollinearity (linear relationship among 2 or more predictors) of the variables.

• None of these indicates severe multicollinearity.

Example of a Run vs. Pass Oriented Offense

• If a team rushes the ball 35 times in a game gaining 150 yards with 1 fumble, passes 12 times for 115 yards and no interceptions, then on average it will score 23.5 points. If they throw an interception then the points scored reduces to 22.9.

• Now suppose a team rushes 12 times for 75 yards without fumbling, passes 35 times, completing 19 for 315 yards with 2 interceptions. They will average 24.085 points.

Example of a Balanced Offense

• For a team that rushes the ball 25 times for 110 yards with 1 fumble, passes 22 times and completing 12 for 145 yards and 2 interceptions will score on average 17.57 points per game.

• If they only throw one interception, then the points scored becomes 18.13.

Statistical Comparison of Offenses

• Definitions:1) An offense is run oriented if it’s attempts

are 1.5 times or greater than it’s pass attempts.

2) An offense is pass oriented if it’s pass attempts are 1.5 times or more than it’s rush attempts.

3) If a team’s passing and rushing attempts are anywhere within 1.5 of each other, then it is balanced.

The ANOVA Procedure

Tukey's Studentized Range (HSD) Test for score

This test controls the Type I experimentwise error rate.

Alpha 0.05

Error Degrees of Freedom 847

Error Mean Square 89.94897

Critical Value of Studentized Range 3.32034

Comparisons significant at the 0.05 level indicated by ***.

Difference

orient Between Simultaneous 95%

Comparison Means Confidence Limits

Run - Bala 5.2083 2.8845 7.5320 ***

Run - Pass 9.2933 6.7978 11.7888 ***

Bala - Pass 4.0850 2.3737 5.7963 ***

Interpretation of Comparisons

• There is a difference between the orientations of teams.

• In fact, they are all different from each other!• Run oriented teams will actually score more points

than both pass oriented and balanced offenses.• Balanced offenses score more often than pass

oriented teams.• Why? Possibly due to the fact that more time is

used by running the football than by passing.

Determining the Probability of Winning the Game

• A win is given a value of 1. If a team ties or loses, they are given a value of 0.

• The only variables that are in a coach’s immediate control are whether they run or pass the ball on offense.

• For this reason, only rushing and passing attempts will be used as independent variables.

Distribution of Wins and Losses

Frequencies of Game Outcomes

Cumulative Cumulative

Outcome Frequency Percent Frequency Percent

Loss 426 50.12 426 50.12

Tie 23 2.71 449 52.82

Win 401 47.18 850 100.00

Method of Analysis

• Logistic Regression will be used to model the probability that a team wins.

• The form of the model is:

(eβ0 + β1 * rushes +β2 *passes)

(1 + eβ0 + β1 * rushes +β2 *passes)

Estimation of Parameters

Analysis of Maximum Likelihood Estimates

Standard

Parameter DF Estimate Error Chi-Square Pr >ChiSq

Intercept 1 -2.1140 0.5403 15.3082 <.0001

ratt 1 0.1259 0.0119 112.5931 <.0001

patt 1 -0.0481 0.0107 20.1301 <.0001


• Both Rushing and Passing Attempts are significant factors in determining the probability of winning a game.

• The parameter estimates are in the form of the natural logarithm.

• Odds Ratios will give more insight into how the model is affected by rushing and passing.

Fit of the Model

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq

9.2815 8 0.3191

This shows that there is not evidence for lack of model fit.

Odds Ratios

Point 95% Wald Effect Estimate Confidence Limits

ratt 1.134 1.108 1.161 patt 0.953 0.933 0.973

Interpretation of the Odds Ratios

• For a one attempt increase in Rushing Attempts, the odds in favor of winning are multiplied by 1.134.

• For every one Pass Attempt, the odds in favor of winning are multiplied by 0.953.

Conversion into Probabilities

• The equation for the probability of winning a game:

P(Win) = 1 / ( 1 + eβ0 + β1 * rushes +β2 *passes)

• This yields:

P(Win) = 1 / (1 + e –2.114+ .1259 * rushes +-.0481 *passes)

Some Examples

• For a team rushing 25 times and passing 25 times, the model yields a probability of .458 that they will win the game, or a 45.8% chance they will win.

• If a team rushes 35 times and passes only 15 times, their probability of winning is .827, or nearly an 83% chance of victory.

• Now, say that team rushes only 15 times and passes 35 times, the probability changes to .129 or a 13% chance of winning.

What the Model Does Not Suggest

• Given the model predicts a higher success rate if a team rushes the ball, it may seem that a team should never pass. If this is done, the model gives a 98.5% chance of victory for 50 rushes and no passes.

• Obviously, if the other team knows you are never going to pass, you won’t be able to move the ball 10 yards on 3 plays very consistently. This shows how real world circumstances aren’t always modeled perfectly.

Another Consideration

• The model also does not take into account middle of the game strategies. In other words, the farther you are behind, the more passes your team will attempt. Why? Less time is taken off of the game clock by passing.

Does the Year and Week affect Points Scored?

• Year in and year out rules are changed to increase scoring.

• Rule changes include:

(1) 2 point conversions allowed

(2) Defensive Line Encroachment Rules

(3) The 5-Yard Bump Rule on Receivers

(4) Etc…….

2 Way ANOVA for Points Scored, Year and Week

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 25 321.405283 12.856211 1.11 0.3186

Error 824 9509.705603 11.540905

C Total 849 9831.110886

R-Square Coeff Var Root MSE tscore Mean

0.032693 38.39034 3.397191 8.849077

Source DF Type III SS Mean Square F Value Pr > F

year 9 133.2107197 14.8011911 1.28 0.2423

week 16 188.1945631 11.7621602 1.02 0.4333

Interpretation of the 2 Way ANOVA

• The model indicates there is not sufficient evidence to conclude that Year and Week have no effect on how many points are scored.

• This means that for any given week in any given year, the points scored by a team is not affected, in this model.

Conclusion

All 3 statistical models point towards Rushing Attempts as being the important statistic in determining the points a team scores, and whether or not they win the game.Ball control is thus the essence to winning a football game. This is most readily seen by a team that rushes with consistency. A team is in better position to win if they can run and pass only occasionally.

References(1) Applied Linear Regression, 2nd Edition, pp.135-136

Sanford Weisberg

Publisher: John Wiley and Sons, 1985

(2) Applied Linear Statistical Models, 4th Edition, pp. 54-55

Neter, Kutner, Nachtsteim, Wasserman

Publisher: Irwin (Chicago) 1996

(3) Professor Kate Cowles

University of Iowa

Department of Statistics and Actuarial Science

(4) Data collected from: http://www.mrncaa.com

Football in the 90s Curtis Olswold University of Iowa 22S Honors Project.

Documents

Transcript of Football in the 90s Curtis Olswold University of Iowa 22S Honors Project.