A Quantitative Analysis of Success Factors in the Association of Tennis Professionals (ATP) Nick...

31
Give Me A Break A Quantitative Analysis of Success Factors in the Association of Tennis Professionals (ATP) www.talksport.c o.uk Nick Korach UP-STAT 2013

Transcript of A Quantitative Analysis of Success Factors in the Association of Tennis Professionals (ATP) Nick...

  • Slide 1
  • A Quantitative Analysis of Success Factors in the Association of Tennis Professionals (ATP) www.talksport.co.uk Nick Korach UP-STAT 2013
  • Slide 2
  • Overview I. Introduction II. Research Objective III. Research Process A. Data Collection B. Supervised Learning Techniques C. Unsupervised Learning Techniques IV. Results V. Conclusions VI. Extensions
  • Slide 3
  • Introduction Why Choose Tennis? 1. In the ever-growing field of Sports Statistics there has been very little research done with tennis. 2. One of my favorite sports.
  • Slide 4
  • Research Objectives To discover what factors are most important in determining success of male singles players(ATP Singles Points). To reduce the dimensionality of predictor variables in order to identify new significant underlying variables.
  • Slide 5
  • Data Collection ATP Singles Data Five Years: 2008, 2009, 2010, 2011, 2012 Top 100 Ranked Male Singles Players www.faconnable.com
  • Slide 6
  • Data Collection Cumulated Season Match Stats 1 Response (Y) Variable 10 Offense/Serving Predictor Variables (X i ) 7 Defense/Return Predictor Variables (X i ) 1 Additional Predictor Variable (X i ) www.atpworldtour.com
  • Slide 7
  • Response (Y) Variable 1. ATP Singles Points Each ATP Tournament is worth a certain number of ATP Singles Points. Generally 250, 500, 1000, 2000 (GS) Points depend on how far a player advances in a tournament. The rankings period is the past 52 weeks
  • Slide 8
  • Current ATP Rankings RankNameNationalityPointsWeek ChangeTourn. Played 1Novak DjokovicSRB12,370019 2Andy MurrayGBR8,750119 3Roger FedererSUI8,67020 4David FerrerESP7,050126 5Rafael NadalESP6,38520 6Tomas BerdychCZE5,145024 7Juan Martin Del PotroARG4,750022 8Jo-Wilfried TsongaFRA3,660026 9Richard GasquetFRA3,230123 10Janko TipsarevicSRB3,00029
  • Slide 9
  • Predictor Variables - Serving Number of Aces Number of Double Faults 1 st Serve Percentage Win Percentage of 1 st Serve Points Win Percentage of 2 nd Serve Points Number of Break Points Faced Percentage of Break Points Saved Service Games Played Win Percentage of Service Games Win Percentage of Service Points www.bleacherreport.com
  • Slide 10
  • Predictor Variables - Returning Win Percentage of 1 st Serve Return Points Win Percentage of 2 nd Serve Return Points Number of Break Point Opportunities Percentage of Break Points Converted Return Games Played Win Percentage of Return Games Win Percentage of Return Points www.bleacherreport.com
  • Slide 11
  • Predictor Variables - Other Win Percentage of Total Points www.bleacherreport.com
  • Slide 12
  • Data Mining Techniques 1.Supervised Learning Techniques Both the response variable (Y) and the explanatory variables (X i ) are used. Multiple Linear Regression 2.Unsupervised Learning Techniques Only explanatory variables (X i ) are used. Cluster Analysis Principal Component Analysis
  • Slide 13
  • Supervised Learning Regression Analysis: a statistical technique for finding the relationship between one or more predictor variables (X i ) and a response (Y). Y = 0 + 1 X 1 + 2 X 2 + + n X n +
  • Slide 14
  • 2012 ATP Singles Points (Y)
  • Slide 15
  • Call: lm(formula = ATP.Singles.Pts ~., data = xy) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.8386094 1.7168561 0.488 0.62655 Aces 0.0001113 0.0004352 0.256 0.79883 Double.Faults -0.0017176 0.0007569 -2.269 0.02591 * X1st.Serve.. 0.0106702 0.0119668 0.892 0.37522 W...1st.Serve.Pts 0.0516267 0.0335954 1.537 0.12826 W...2nd.Serve.Pts 0.0428350 0.0249383 1.718 0.08968. No..Break.Pts.Faced -0.0039531 0.0009885 -3.999 0.00014 *** X..Break.Pts.Saved 0.0391788 0.0085976 4.557 1.81e-05 *** Service.Gms.Played 0.0045717 0.0031527 1.450 0.15090 Service.Gms.W.. -0.0538452 0.0223019 -2.414 0.01802 * Service.Pts.W.. 0.0095231 0.0614412 0.155 0.87721 W...1st.Serve.Return.Pts 0.0054565 0.0340298 0.160 0.87301 W...2nd.Serve.Return.Pts -0.0067646 0.0214298 -0.316 0.75307 No..Break.Pt.Opportunities 0.0051992 0.0010445 4.978 3.56e-06 *** X..Break.Pts.Converted 0.0309062 0.0094591 3.267 0.00159 ** Return.Gms.Played -0.0042145 0.0031852 -1.323 0.18952 W...Return.Gms -0.0234497 0.0247880 -0.946 0.34696 W...Return.Pts 0.0492521 0.0487705 1.010 0.31556 Total.Pts.W.. -0.0389876 0.0759609 -0.513 0.60917 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2147 on 81 degrees of freedom Multiple R-squared: 0.9215, Adjusted R-squared: 0.9041 F-statistic: 52.83 on 18 and 81 DF, p-value: < 2.2e-16 Note: log(Y) used
  • Slide 16
  • Slide 17
  • Possible Multicollinearity? When two or more predictor variables are highly correlated with one another. Two best examples: Win Percentage of Service Points Win Percentage of Service Games Win Percentage of Return Points Win Percentage of Return Games
  • Slide 18
  • Reduced Model Using Stepwise Regression using the Bayesian Information Criterion Call: lm(formula = ATP.Singles.Pts ~ Double.Faults + W...1st.Serve.Pts + W...2nd.Serve.Pts + No..Break.Pts.Faced + X..Break.Pts.Saved + Service.Gms.Played + Service.Gms.W.. + No..Break.Pt.Opportunities + X..Break.Pts.Converted, data = xy) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.5973047 0.8292794 3.132 0.002343 ** Double.Faults -0.0018275 0.0007234 -2.526 0.013273 * W...1st.Serve.Pts 0.0334651 0.0139064 2.406 0.018152 * W...2nd.Serve.Pts 0.0350553 0.0134740 2.602 0.010845 * No..Break.Pts.Faced -0.0044682 0.0007290 -6.129 2.30e-08 *** X..Break.Pts.Saved 0.0390309 0.0073757 5.292 8.46e-07 *** Service.Gms.Played 0.0010213 0.0004333 2.357 0.020581 * Service.Gms.W.. -0.0455592 0.0146466 -3.111 0.002501 ** No..Break.Pt.Opportunities 0.0047017 0.0004306 10.919 < 2e-16 *** X..Break.Pts.Converted 0.0234435 0.0058110 4.034 0.000115 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2113 on 90 degrees of freedom Multiple R-squared: 0.9155, Adjusted R-squared: 0.9071 F-statistic: 108.3 on 9 and 90 DF, p-value: < 2.2e-16
  • Slide 19
  • Unsupervised Learning Cluster Analysis: the process of organizing objects into groups whose elements are similar in some way. Principal Component Analysis: the process of reducing the number of predictor variables into components to discover new underlying variables.
  • Slide 20
  • 2012 Data Cluster Dendrogram
  • Slide 21
  • 2012 Data PCA Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation 2.8192100 2.2102469 1.19581683 1.05021129 0.86589111 Proportion of Variance 0.4460126 0.2741409 0.08024567 0.06189359 0.04207449 Cumulative Proportion 0.4460126 0.7201536 0.80039923 0.86229282 0.90436731 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Standard deviation 0.70453764 0.61115543 0.57260441 0.47311020 0.37507924 Proportion of Variance 0.02785484 0.02096021 0.01839932 0.01256079 0.00789475 Cumulative Proportion 0.93222215 0.95318236 0.97158168 0.98414247 0.99203722 Comp.11 Comp.12 Comp.13 Comp.14 Standard deviation 0.212694821 0.157319153 0.152737548 0.1324079805 Proportion of Variance 0.002538669 0.001388851 0.001309133 0.0009838313 Cumulative Proportion 0.994575889 0.995964739 0.997273873 0.9982577041 Comp.15 Comp.16 Comp.17 Comp.18 Standard deviation 0.1281058505 0.0882134002 0.0803585561 0.0199374495 Proportion of Variance 0.0009209376 0.0004366781 0.0003623736 0.0000223065 Cumulative Proportion 0.9991786418 0.9996153199 0.9999776935 1.0000000000
  • Slide 22
  • 2012 Data PCA Scree Plot
  • Slide 23
  • Slide 24
  • Principal Components Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Aces -0.273 0.000 0.000 0.000 0.000 0.000 0.000 Double.Faults 0.000 0.000 0.479 0.000 0.000 0.000 0.000 X1st.Serve.. 0.000 0.000 0.000 -0.858 0.000 0.000 0.000 W...1st.Serve.Pts 0.000 0.000 0.000 0.301 0.000 0.000 0.000 W...2nd.Serve.Pts 0.000 0.000 0.000 0.000 0.000 -0.511 0.000 No..Break.Pts.Faced 0.000 0.000 0.362 0.000 0.000 0.000 0.000 X..Break.Pts.Saved 0.000 0.000 0.000 0.000 0.000 0.575 0.000 Service.Gms.Played -0.334 0.000 0.000 0.000 0.000 0.000 0.000 Service.Gms.W.. 0.000 0.268 0.000 0.000 0.000 0.000 0.000 Service.Pts.W.. 0.000 0.275 0.000 0.000 0.000 0.000 0.000 W...1st.Serve.Return.Pts 0.000 0.000 0.000 0.000 0.000 0.000 0.601 W...2nd.Serve.Return.Pts 0.000 0.000 0.000 0.000 0.000 0.000 -0.739 No..Break.Pt.Opportunities -0.320 0.000 0.000 0.000 0.000 0.000 0.000 X..Break.Pts.Converted 0.000 0.000 0.000 0.000 0.841 0.000 0.000 Return.Gms.Played -0.333 0.000 0.000 0.000 0.000 0.000 0.000 W...Return.Gms 0.000 -0.389 0.000 0.000 0.000 0.000 0.000 W...Return.Pts 0.000 -0.392 0.000 0.000 0.000 0.000 0.000 Total.Pts.W.. -0.320 0.000 0.000 0.000 0.000 0.000 0.000
  • Slide 25
  • Results Stepwise Regression Results Predictor Variable20082009201020112012 Number of Aces Number of Double FaultsXX 1st Serve PercentageXX Win Percentage of 1st Serve PointsXX Win Percentage of 2nd Serve PointsX Number of Break Points FacedXXXXX Percentage of Break Points SavedXXXXX Service Games PlayedXXXX Win Percentage of Service GamesXXX Win Percentage of Service PointsX Win Percentage of 1st Serve Return Points Win Percentage of 2nd Serve Return PointsX Number of Break Point OpportunitiesXXXXX Percentage of Break Points ConvertedXXX Return Games PlayedXX Win Percentage of Return Games Win Percentage of Return PointsX Win Percentage of Total Points
  • Slide 26
  • Results PCA - Percent of Variation Explained YearScree Plot Elbow90% of Variation Explained95%99% 2008Component 5 Component 8Component 11 2009Component 5 Component 7Component 10 2010Component 5 Component 7Component 10 2011Component 5 Component 7Component 10 2012Component 5 Component 7Component 10
  • Slide 27
  • Results PCA - Components Predictor Variable20082009201020112012 Number of Aces26111 Number of Double Faults33333 1st Serve Percentage44444 Win Percentage of 1st Serve Points42444 Win Percentage of 2nd Serve Points63676 Number of Break Points Faced33343 Percentage of Break Points Saved56566 Service Games Played11111 Win Percentage of Service Games22122 Win Percentage of Service Points22122 Win Percentage of 1st Serve Return Points77577 Win Percentage of 2nd Serve Return Points75727 Number of Break Point Opportunities11111 Percentage of Break Points Converted57555 Return Games Played11111 Win Percentage of Return Games22222 Win Percentage of Return Points22222 Win Percentage of Total Points11111
  • Slide 28
  • New Underlying Variables Component 1 Physical Service Games Played, Return Games Played, No. of Break Point Opportunities, Win % of Total Points Component 2 Technical Win % of Service Games, Win % of Service Points, No. of Aces, Win % of Return Games, Win % of Return Points Component 3 Tactical No. of Double Faults, No. of Break Points Faced Component 4 Mechanical 1st Serve Percentage, Win % of 1st Serve Points Component 5 Psychological/Mental % of Break Points Saved, % of Break Points Converted
  • Slide 29
  • Conclusions The factors which are most important in determining the success of a male tennis players almost all deal with break of service. Reducing the dimensionality of the data allows us to identify new underlying variables: Physical, Technical, Tactical, Mechanical, Mental
  • Slide 30
  • Extensions Perform regression analysis on additional response variables such as win percentage or prize money won. Decompose the data by studying variables that are not accumulated. Match by match. nbcsports.msnbc.com
  • Slide 31
  • Questions? Special Thanks to Dr. Ernest Fokoue www.inc-anto.net Probability Formulas and Statistical Analysis in Tennis Journal of Quantitative Analysis of Sports www.atpworldtour.com http://statracket.net www.stevegtennis.com