Framework for Forecasting Professional Soccer Player Career Paths
-
Upload
soccermetrics-research-llc -
Category
Sports
-
view
8.034 -
download
0
Transcript of Framework for Forecasting Professional Soccer Player Career Paths
2015 OptaPro Analytics Forum
Framework for a Player Career Forecast Model
Between Multiple Leagues
Howard HamiltonFounder, Soccermetrics Research
2015 OptaPro Analytics Forum
Developed a career statistical forecasting modelling framework for football players, automated by applying machine-learning techniques.
Inputs 1. Season statistical performance2. Physical / playing
characteristics
Outputs1. Identify peer group of players with comparable
performance2. Forecast future statistical performance over a limited
horizon3. Translate performance in one domestic league
competition to performance in another
Expected Interest Clubs Media Betting Fantasy
Early Stage: Framework > Results
Main Points
2015 OptaPro Analytics Forum
Baseball1. Similarity Scores (Bill James, 1980s)
2. Vladimir Forecasting System (Gary Huckabay, 1990s)
3. PECOTA (Nate Silver/Baseball Prospectus, 2003)
PECOTA-inspired forecasting models in other sports1. SCHOENE (Kevin Pelton/Basketball Prospectus/ESPN, mid 2000s)
2. KUBIAK (Aaron Schatz/Football Outsiders, mid 2000s)
3. VUKOTA (Puck Prospectus, 2010)
Individual / team projection models in football1. Aaron Nielsen (ENB Sports)
• One-year projection of individual/team performance
2. Pérez Sánchez et al (2013)• Estimating goal-scoring performance in Spanish league
Forecasting Statistical Performance in Sport
Prior Art
2015 OptaPro Analytics Forum
Data scarcity• Range of seasons• Statistical categories collected• League variations
Characteristics of domestic leagues• Differences in aging curves between leagues• Would a 'universal' aging curve work? Not sure...• Statistical translations between leagues• Some leagues are very connected, others less so
Challenges
2015 OptaPro Analytics Forum
Data Source: ENB Soccer Database• 60,000+ players, • 75 domestic league competitions, • 500+ clubs
Individual season statistics• 1992-93 to 2011-12 (European)• 1992 to 2012 (American/Scandinavian/Japanese)
Database Analysis
All players• Season• Team• Competition• Appearances• Subs• Minutes• Yellows / reds
Field players• Goals• Assists• Shots• Fouls
Goalkeepers• Goals allowed• Clean sheets• Shots faced• Wins• Draws• Losses
Modeling Components
2015 OptaPro Analytics Forum
Normalize statistical categoriesConvert statistical values of players in same competition and season • to “standard score”• Places statistical performances on one standard distribution• This is what allows us to compare players
Identify K comparable players (“nearest neighbors”)• Consider players of same age and position• Calculate similarity score between statistical records• Comparable players: Score about 0.90 - 0.95• Relax threshold for “unique” players
Forecast future performance with historical performance of comparable playersUsing regression techniques • Adjust for aging and regression to mean• Convert to statistics for league competition of interest
(x-)/
K-NN
Model Description
2015 OptaPro Analytics Forum
Player League Season Similarity
Osvaldo Val Baiano Brazil Serie B 2007 0.961
Wayne Rooney English Premier League 2011-2012 0.957
Oscar Cardozo Portugal Primeira Liga 2009-2010 0.954
Maciej Zurawski Poland Ekstraklasa 2002-2003 0.939
Carlos Tevez English Premier League 2010-2011 0.926
Javi Moreno Spanish Primera 2000-2001 0.925
Katlego Mphela South Africa PSL 2010-2011 0.913
Matt Tubbs England Conference 2010-2011 0.913
Kris Boyd Scotland Premier League 2009-2010 0.905
Goncalves Jonas Brazil Serie A 2010 0.904
Rickie Lambert England League One 2008-2009 0.901
Mario Bermejo Spanish Segunda 2004-2005 0.897
Alan Shearer English Premier League 1996-1997 0.877
Kevin Phillips English Premier League 1999-2000 0.863
Photo by Simon Harriyott
Cristiano Ronaldo: Forward, aged 27 (Spanish Primera 2011/12)
Active Player.
Scored 46 goals in 2011/12La Liga season.
Nearest Neighbor Results
Nearest Neighbor groups leading goalscorers at Ronaldo's age
0.96 similarity metric – few players had a season as dominant
2015 OptaPro Analytics Forum
Marvin Bejarano: Defender, aged 21 (Bolivia Liga Profesional 2008)
Player League Season Similarity
Fernando Tobio Argentina Primera 2009-2010 0.996
Charlie Wassmer England League Two 2011-2012 0.990
Oswaldo Alanis Mexico Primera 2009-2010 0.985
Jan Vertonghen Netherlands Eredivisie 2007-2008 0.984
Paul Papp Romania Liga I 2009-2010 0.957
Santiago Vergini Paraguay Primera 2009 0.957
Mauricio Casierra Colombia Primera 2006 0.957
Rafael Delgado Argentina Nacional B 2010-2011 0.955
Konstantin Engel Germany 2 Bundesliga 2008-2009 0.954
Jae Sung Lee South Korea K-League 2009 0.953
Koybasi Ismail Turkey Super Lig 2009-2010 0.953
Luke O'Brien England League Two 2008-2009 0.951
Hector Quinones Colombia Primera 2012 0.950
Mate Ghvinianidze Germany 2 Bundesliga 2006-2007 0.950
Franz Schiemer Austria 1 Bundesliga 2006-2007 0.947
Active Player.
Has played for one club over his career.
5 caps for Bolivia.
0.996 similarity metric – very comparable, but limited defensive data
Nearest Neighbor Results
2015 OptaPro Analytics Forum
Iker Casillas: Goalkeeper, aged 26 (Spanish Primera, 2006-2007)
Active Player.
Has played for one club over his career.
450+ appearances at Real Madrid,160 caps for Spain.
Interesting that Gianluigi Buffon is closest comparable at 26 y/o
Nearest Neighbor Results
Player League Season Similarity
Gianluigi Buffon Italy Serie A 2003-2004 0.994
Mark Crossley English Premier League 1994-1995 0.992
Dionissis Chiotis Greece Super League 2002-2003 0.990
Steve Mandanda France Ligue 1 2010-2011 0.989
Marco Wolfli Switzerland Super League 2007-2008 0.989
Shay Given English Premier League 2001-2002 0.986
Guillermo Ochoa Mexico Primera 2010-2011 0.986
Eduardo Martini Brazil Serie A 2004 0.985
Morgan de Sanctis Italy Serie A 2002-2003 0.984
Hiroki Iikura Japan J1-League 2011 0.982
Cesar Lainez Spanish Segunda 2002-2003 0.981
Marcelo Grohe Brazil Serie A 2012 0.981
Hitoshi Sogahata Japan J1-League 2005 0.980
Henri Sillanpaa Finland Veikkausliiga 2004 0.980
2015 OptaPro Analytics Forum
Projecting career performance is difficult• Next steps:
● Use nearest neighbors to forecast future performance● Quantify adjustments for age, league quality, position● Create multiple career forecast paths with probabilities
• Limited horizons important (2-3 years)• Probabilistic projections sensible, not necessarily useful
• Accuracy vs. clarity• Diverse range of statistical categories necessary –
• Attacking and defending contributions and impact• Advanced metrics
Data normalization is a necessity!
Club projections are logical stepNeed to enforce a “conservation of goals” in the universe of data in our
system, i.e:
Total goals scored == total goals conceded
Photo by Simon Harriyott
Conclusions
2015 OptaPro Analytics Forum
Customization• Integrate with financial/medical databases, scouting data• Greatest utility at football operations/sporting director level
Biggest challenge: Data!Not just data on all players in league, but players • in all other leagues of interest• Some statistical categories not available in some leagues• As always, data collection and analysis problems are non-trivial
Photo by JD Hancock
Knowledge Transfer
2015 OptaPro Analytics Forum
Thank You!Special Thanks To:
OptaPro (Invitation to Forum)Aaron Nielsen (ENB Database access)
Simon Harriyott (Presentation at Forum)
For more information contactSoccermetrics Research
@soccermetrics