Michael Grenon CS378 Data Mining Spring 2018 Optimizing...

22
Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data Mining Spring 2018

Transcript of Michael Grenon CS378 Data Mining Spring 2018 Optimizing...

Page 1: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

OptimizingBaseball Performance andPlayer Salary

Michael GrenonCS378 Data Mining

Spring 2018

Page 2: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 3: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Baseball ♥ Stats

Page 4: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

North America ♥ Baseball

Page 5: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 6: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Franchise Entertainmentorganization

moneydata

wins

2017*

Optimization

Page 7: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Salary optimizationHow did teams optimize their player salaries?

● No Salary Cap!● Similarity problem

○ Linear correlation○ Pearson correlation coefficient

Page 8: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

How to Play to Win?Which aspects of play most strongly correlate with winning?

● Similarity problem○ (Linear) association○ Pearson correlation coefficient

Page 9: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

How do the best teams use their players?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

Page 10: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 11: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 12: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

R = 0.253

Page 13: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Price Per Win

Page 14: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

E = win% * (ppw rank + win rank)

R = -0.555

Page 15: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

BsR ≈ Baserunning WAR

Taken from team_batting(2017)

R = 0.548

Page 16: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

wSL ≈ weighted Slider

Taken from team_batting(2017)

R = 0.500

Page 17: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

¯\_(ツ)_/¯

Page 18: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

What’s next?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

Page 19: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

What’s next?To what extent are win-loss record and attendance related?

● Extending Pearson correlation analysis

Page 20: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions

● Hitting coaches: teach how to hit a slider○ Pitch most “cost-efficient” to excel at hitting○ ...but not by much

● Fielding coaches: emphasize speed and skill on baserunning○ More closely associated with salary efficiency than any other performance

■ Even batting, pitching

Page 21: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions

● Chess match: all pieces are important ● 2-sided game● Predictive (vs. descriptive) statistics

○ Time-series analysis

Page 22: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions● Too much data

○ batting_stats(): 287 attributes?● Statcast data

○ More complex mining techniques○ Neural Nets

● Data warehouses incomplete, disorganized○ Private sector