Michael Grenon CS378 Data Mining Spring 2018 Optimizing...

Post on 09-Aug-2020

1 views 0 download

Transcript of Michael Grenon CS378 Data Mining Spring 2018 Optimizing...

OptimizingBaseball Performance andPlayer Salary

Michael GrenonCS378 Data Mining

Spring 2018

Baseball ♥ Stats

North America ♥ Baseball

Franchise Entertainmentorganization

moneydata

wins

2017*

Optimization

Salary optimizationHow did teams optimize their player salaries?

● No Salary Cap!● Similarity problem

○ Linear correlation○ Pearson correlation coefficient

How to Play to Win?Which aspects of play most strongly correlate with winning?

● Similarity problem○ (Linear) association○ Pearson correlation coefficient

How do the best teams use their players?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

R = 0.253

Price Per Win

E = win% * (ppw rank + win rank)

R = -0.555

BsR ≈ Baserunning WAR

Taken from team_batting(2017)

R = 0.548

wSL ≈ weighted Slider

Taken from team_batting(2017)

R = 0.500

¯\_(ツ)_/¯

What’s next?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

What’s next?To what extent are win-loss record and attendance related?

● Extending Pearson correlation analysis

Preliminary Conclusions

● Hitting coaches: teach how to hit a slider○ Pitch most “cost-efficient” to excel at hitting○ ...but not by much

● Fielding coaches: emphasize speed and skill on baserunning○ More closely associated with salary efficiency than any other performance

■ Even batting, pitching

Preliminary Conclusions

● Chess match: all pieces are important ● 2-sided game● Predictive (vs. descriptive) statistics

○ Time-series analysis

Preliminary Conclusions● Too much data

○ batting_stats(): 287 attributes?● Statcast data

○ More complex mining techniques○ Neural Nets

● Data warehouses incomplete, disorganized○ Private sector