DUG#6 - Predicting the 2015 Rugby World Cup Results

14
A little Rugby with Data Science Studio!

Transcript of DUG#6 - Predicting the 2015 Rugby World Cup Results

Page 1: DUG#6 - Predicting the 2015 Rugby World Cup Results

A little Rugby with Data Science Studio!

Page 2: DUG#6 - Predicting the 2015 Rugby World Cup Results

IntroductionWhy? Because of the analysis of the last Rugby World Cup by this guy: http://andrewyuan.github.io/EDAV-project.htmlOutline: •Getting Data•Exploring the Data•Building the preliminary model•Discussing limits and possible improvements of the model

Page 3: DUG#6 - Predicting the 2015 Rugby World Cup Results

Collecting Data•Web scraping in python (beautifulSoup + urllib2)

•Thank you to rugbydata.com : http://www.rugbydata.com/italy/romania/gamesplayed/

•Easy to parse!

Page 4: DUG#6 - Predicting the 2015 Rugby World Cup Results

Team Dataset

Page 5: DUG#6 - Predicting the 2015 Rugby World Cup Results

Games Dataset

Page 6: DUG#6 - Predicting the 2015 Rugby World Cup Results

Exploration some features with Graph… and getting counter intuitive results!

Page 7: DUG#6 - Predicting the 2015 Rugby World Cup Results

Average number of points per gameJapan, Argentina and Namibia have the most points per game, while the 6 nations teams are the lowest…

Page 8: DUG#6 - Predicting the 2015 Rugby World Cup Results

Graph of games played

South Africa and Japan have never played each other!

Page 9: DUG#6 - Predicting the 2015 Rugby World Cup Results

Predicting the outcome of a game

•Outcome of a game : 0 if team 1 loses. 1 if team1 wins.

•Choosing the features: -Historic of the games (weighed or not) -Historic of points (weighed or not) -Historic of confrontations 1v1 (weighed or not) -Home game or not -Series of wins

•Particular precautions : No features like number of games played.

Choice of algorithm: Random Forest

Page 10: DUG#6 - Predicting the 2015 Rugby World Cup Results

Feature importance

accuracy : 0.7

Page 11: DUG#6 - Predicting the 2015 Rugby World Cup Results

ROC Graph

Page 12: DUG#6 - Predicting the 2015 Rugby World Cup Results

Results (good and not so good)

Assessing your predictions: - common good sense - Bookmakers

Comparison with four games played so far:France vs Italie : 0.881 (bookmaker : 0.909)England vs Fidji : 0.880 (bookmaker : 0.933)New-Zeeland vs Argentina : 0.943 (bookmaker : 0.980)

South Africa vs Japon : 0.496 (bookmaker : 0.964)

Page 13: DUG#6 - Predicting the 2015 Rugby World Cup Results

Limits and possible improvements• Predictions aren’t very good when there are very few

direct games between the two teams (Namibia and Japan for example)

• Adding the global rankings• No possible simulation on the long term• Doesn’t take into account bonus/malus

• Adding new features (teams in common…)• Taking into account the players that compose the team:

Page 14: DUG#6 - Predicting the 2015 Rugby World Cup Results

Thank you

And “ALLEZ LES BLEUS”