Perfunctory NBA Analysis

7
Radu Stancut Applied Data Science Foundations Project Outline/Notes Introduction This Foundations paper will look at the performance of professional basketball teams in the National Basketball Association (NBA) over the past ten years and investigate the relationship that both offensive and defensive team statistics have on win totals, and by extension playoff appearances. Specifically, I will be looking at the field goal percentages (FG%) on both offense, the proficiency with which a team scores a basket, and defense, the effectiveness of keeping the other team from scoring. It is expected that above average rates of performance in either category would have a positive influence on winning games, but just how much remains to be seen. Statistical analysis and sports have become increasingly more intertwined over the last decade in the public imagination. The reader may be aware of Moneyball, either the book or the movie, if not both, describing the machinations of Major League Baseball’s (MLB) Oakland Athletics’ General Manager, Billy Bean, who due to a limited payroll was inspired to leverage statistical analysis in order to find underappreciated assets, in this case ballplayers, with which to build several winning teams. This quantification of sports has in fact been going on for several decades, from Bill James’ Baseball Prospectus in the 1980’s through to today’s Nate Silver, ESPN-backed, website, fivethirtyeight.com, and onto the MIT Sloan Sports Analytics Conference.

Transcript of Perfunctory NBA Analysis

Page 1: Perfunctory NBA Analysis

Radu Stancut

Applied Data Science

Foundations Project Outline/Notes

Introduction

This Foundations paper will look at the performance of professional basketball teams in

the National Basketball Association (NBA) over the past ten years and investigate the

relationship that both offensive and defensive team statistics have on win totals, and by extension

playoff appearances. Specifically, I will be looking at the field goal percentages (FG%) on both

offense, the proficiency with which a team scores a basket, and defense, the effectiveness of

keeping the other team from scoring. It is expected that above average rates of performance in

either category would have a positive influence on winning games, but just how much remains to

be seen.

Statistical analysis and sports have become increasingly more intertwined over the last

decade in the public imagination. The reader may be aware of Moneyball, either the book or the

movie, if not both, describing the machinations of Major League Baseball’s (MLB) Oakland

Athletics’ General Manager, Billy Bean, who due to a limited payroll was inspired to leverage

statistical analysis in order to find underappreciated assets, in this case ballplayers, with which to

build several winning teams. This quantification of sports has in fact been going on for several

decades, from Bill James’ Baseball Prospectus in the 1980’s through to today’s Nate Silver,

ESPN-backed, website, fivethirtyeight.com, and onto the MIT Sloan Sports Analytics

Conference.

Page 2: Perfunctory NBA Analysis

This paper is meant to provide a surface investigation into some such sports stats and

their influence on team outcomes. Below I describe the data collected, its analysis, my findings

and conclusions, followed by ideas on next steps

Data Description

NBA team statistics for regular seasons 2005 – 2014 were collected in the form of CSV

files from Basketball-Reference.com; two tables per season were collected, one for stats on

offense and the other on defense. The tables consisted of team names and a breakdown of team

performance by minutes, points, assists, rebounds, field goal percentages, and other categories (a

full list may be found in the appendix). In addition to the team season performance metrics I also

collected the win/loss standings and conference/division information for each season in an Excel

document that I later formatted for merging.

The offense and defense statistics were kept separate from one another but the ten

seasons had to be combined into one data frame for analysis. This combining of different seasons

was done via a ‘for loop’ on import within Python. The NBA team information of win/loss

records and conference/division data was also imported and subsequently merged, leading to two

data frames, one each for offense and defense, of 300 instances (30 teams, ten years) with

varying performance measurements (list in appendix).

Several calculations and identification steps were taken prior to running regression analysis:

• The winning percentage (Win%) of each team, by year, was calculated and placed into a

new column.

• A column was created to flag whether a team had a winning record (Win% > 0.5).

• Playoff teams were identified and flagged in a new binary column (Playoffs)

Page 3: Perfunctory NBA Analysis

• FG% was normalized by grouping along Year, Year & Conference, and Year & Division,

and setting the average to 1.0 to allow for comparative analysis across seasons; a rating

above 1.0 in FG% on offense meant better than average, while the reverse was true on

defense FG%, a team had to be below 1.0 to be ‘above’ average in keeping opponents

from scoring (team examples in appendix).

Descriptive Statistics

To provide a general picture and context to the findings described in the next section, I have

outlined some exploratory numbers on the data here:

• The win distribution of NBA teams over this 10 year range tends toward a normal

distribution (number description in appendix):

• Winning teams had the following min, max, and mean indexed FG% over ten years:

o Offense – 0.945 (min/worst); 1.102 (max/best)1.015 (mean)

o Defense – 0.917 (min/best); 1.030 (max/worst); 0.982 (mean)

• Losing teams had the following min, max, and mean indexed FG% over ten years:

o Offense – 0.924 (min/worst); 1.044 (max/best); 0.984 (mean)

o Defense – 0.939 (min/best); 1.082 (max/worst); 1.018 (mean)

Page 4: Perfunctory NBA Analysis

Methods & Analysis

As seen in the descriptive statistics above, this researcher’s expectations that above

average rates of performance in either category would have a positive influence on winning

games, has some superficial merit. In order to delve deeper I created scatter plots (see appendix),

by offense and defense, for FG% indexed by year (FG% IDX_Year), by year & conference

(FG% IDX_Conf), and year & division (FG% IDX_Div). NBA teams routinely play most of

their games within conference (52 of 82) and a sizable sample of those games are against

division rivals (16 of 52), the field goal grouping was meant to see if any of the various

performance indexes provided a better predictor of wins.

The slopes of the fitted lines for each indexes FG% group, league-wide were:

• Offense

o FG% IDX_Year: 3.011

o FG% IDX_Conf: 3.048

o FG% IDX_Div: 3.419

• Defense

o FG% IDX_Year: -3.500

o FG% IDX_Conf: -3.595

o FG% IDX_Div: -3.667

Two things jump out from the numbers above. First, the defensive FG% appeared to have

a bigger influence, as measured by the slope, than the offensive rate. Secondly, the more specific

indexes, with respect to conference and division, had a bigger influence by slope for both offense

and defense.

Following the scatter plots and fitted lines, a linear regression analysis was run; the

dependent variable in all instances was Win%, with the independent variable alternating between

offensive and defensive FG% IDX fields. In this instance there was little to no difference in

Page 5: Perfunctory NBA Analysis

results between league, conference, and division comparisons so only the broadest measure,

league-wide, is outlined below and included in the appendix:

• Offense

o Coef: 0.5025; Std err: .008; R-squared: 0.924

• Defense

o Coef: 0.496; Std err: .0010; R-squared: 0.901

The results indicate that both offensive and defensive FG% performance numbers do a good job

of explaining winning in the NBA. Of course, much like the influence of education on wages

discussed in class, this could be due to other factors being bundled within these measurements.

Discussion & Next Steps

Taking up the thread of bundled influences above this research would need to be expanded to

other team statistical performances to properly weigh the impact that FG% alone has on winning.

Additionally, the analysis contained here dealt with only end of the year numbers and does not

purport to be predictive for in-season numbers. This is another area where further research would

need to be done, i.e., how does a team’s performance up to the All-Star break (about half way

into the season) factor into the remaining games and final win/loss record.

Page 6: Perfunctory NBA Analysis

Appendix

List of Fields Used in Analysis, Including ‘Index’ Fields Created

'2P'

'Playoffs'

'2P%'

'STL'

'2PA'

'TOV'

'3P'

'TRB'

'3P%'

'Team'

'3PA'

'Team2'

'AST'

'Wins'

'BLK'

'Year'

'DEF Rk'

'Conference'

'DRB'

'Division'

'FG'

'Win%'

'FG%'

'Winning

Record'

'FGA'

'FG% Year'

'FT'

'2P% Year'

'FT%'

'3P% Year'

'FTA'

'FG% Conf'

'G'

'2P% Conf'

'Losses'

'3P% Conf'

'MP'

'FG% Div'

'ORB'

'2P% Div'

'PF'

'3P% Div'

'PTS'

'FG%

IDX_Year'

'PTS/G'

'FG%

IDX_Conf'

'FG% IDX_Div'

NBA Team Win Description (Histogram Companion)

count 300.000000

mean 40.196667

std 12.610899

min 7.000000

25% 31.750000

50% 41.000000

75% 50.000000

max 67.000000

Page 7: Perfunctory NBA Analysis

Offense Linear Regression

OLS Regression Results

==============================================================================

Dep. Variable: Win% R-squared: 0.924

Model: OLS Adj. R-squared: 0.924

Method: Least Squares F-statistic: 3635.

Date: Wed, 10 Dec 2014 Prob (F-statistic): 2.31e-169

Time: 11:54:17 Log-Likelihood: 155.31

No. Observations: 300 AIC: -308.6

Df Residuals: 299 BIC: -304.9

Df Model: 1

================================================================================

coef std err t P>|t| [95.0% Conf. Int.]

--------------------------------------------------------------------------------

FG% IDX_Year 0.5025 0.008 60.292 0.000 0.486 0.519

==============================================================================

Omnibus: 20.895 Durbin-Watson: 2.137

Prob(Omnibus): 0.000 Jarque-Bera (JB): 8.860

Skew: -0.167 Prob(JB): 0.0119

Kurtosis: 2.227 Cond. No. 1.00

==============================================================================

Defense Linear Regression

OLS Regression Results

==============================================================================

Dep. Variable: Win% R-squared: 0.901

Model: OLS Adj. R-squared: 0.900

Method: Least Squares F-statistic: 2710.

Date: Wed, 10 Dec 2014 Prob (F-statistic): 6.03e-152

Time: 11:05:58 Log-Likelihood: 115.08

No. Observations: 300 AIC: -228.2

Df Residuals: 299 BIC: -224.5

Df Model: 1

================================================================================

coef std err t P>|t| [95.0% Conf. Int.]

--------------------------------------------------------------------------------

FG% IDX_Year 0.4961 0.010 52.055 0.000 0.477 0.515

==============================================================================

Omnibus: 20.496 Durbin-Watson: 2.091

Prob(Omnibus): 0.000 Jarque-Bera (JB): 8.474

Skew: -0.142 Prob(JB): 0.0145

Kurtosis: 2.227 Cond. No. 1.00

==============================================================================