(805344378) Discuss the Effectiveness and the Ethics of the Application of Data

6
Discuss the Effectiveness and the Ethics of the Application of Data Analytics to Football Statistics By George Stevens To the majority of people, statistics in football are about as simple as it gets: number of shots, shots on target, possession, corners, red and yellow cards etc. Many people would say, in a game where anything can happen and luck can play a big part, that these statistics dont show anything and even commentators and pundits regularly disregard the importance of statistics whenever an anomalous result happens. Considering the basic nature of these figures, those who dont trust them may have a point but if you look at the considerable investment made by top level clubs into football analytics, then surely statistics are important for more than just a summary or a soundbite. One application of football statistics would be to use them to create a prediction model. Whether you’re just a typical fan trying to get one over on your friends by predicting results, or someone who wants to make serious money in the football betting industry, the concept of a fool proof prediction model is a very attractive idea. If someone were able to create such a model, the possibilities would be endless for them but in reality this is a fanciful idea and not something that has been created. It is however easier to find a model that fares slightly better than if just using random uneducated guesses. Given that every football match has three possible outcomes, a home win, an away win and a draw, it is not difficult to deduce that the possibility of un-intuitively guessing the outcome of a match is one third, therefore a model thats able to get more than a third of results correct could be considered successful. Some people would consider looking at what the bookmakers predict as the most likely outcome of a match as a reasonable way of making a prediction. This can be backed up the by the data in the Mirror article Backing the favourite: But in the Premier League do they always win?[1]. According to this article, the bookmaker’s favourite for Premier League matches wins 54.7% of the time, a much better success rate than just guessing would achieve. On the face of it, this does seem like a fairly high success rate, however it would also be logical to suggest that a football fan with a good understanding of the Premier League would also be able to guess results correctly more often than someone just randomly guessing. We also need to consider that bookmakers odds arent there to help us. Bookmakers will obviously need to make some sort of a profit so the odds that they give for matches will have to reflect that. If there was a betting company that had a very successful prediction model would they always base their odds on that? I think if they had a model that predicted results correctly 80% of the time for example, then basing their odds on that would have a negative effect on the amount of money they get. As soon as someone clocks on to the fact that the result that the bookmakers say is most likely is almost always the correct result, customers would put down a significant amount of money on the bookmakers favourite, knowing that they will win almost all the time. Another method of using statistics to predict the outcome of a match that has become increasingly popular is comparing the expected goals that the two teams involved have. The expected goals, a statistic created by the football statistician and blogger Michael Caley [2] is the probability, based on historical averages, of a particular shot going in. This takes into

Transcript of (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

Page 1: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

Discuss the Effectiveness and the Ethics of the Application of Data

Analytics to Football Statistics

By George Stevens

To the majority of people, statistics in football are about as simple as it gets: number of

shots, shots on target, possession, corners, red and yellow cards etc. Many people would

say, in a game where anything can happen and luck can play a big part, that these statistics

don’t show anything and even commentators and pundits regularly disregard the importance

of statistics whenever an anomalous result happens. Considering the basic nature of these

figures, those who don’t trust them may have a point but if you look at the considerable

investment made by top level clubs into football analytics, then surely statistics are important

for more than just a summary or a soundbite.

One application of football statistics would be to use them to create a prediction model.

Whether you’re just a typical fan trying to get one over on your friends by predicting results,

or someone who wants to make serious money in the football betting industry, the concept of

a fool proof prediction model is a very attractive idea. If someone were able to create such a

model, the possibilities would be endless for them but in reality this is a fanciful idea and not

something that has been created. It is however easier to find a model that fares slightly

better than if just using random uneducated guesses. Given that every football match has

three possible outcomes, a home win, an away win and a draw, it is not difficult to deduce

that the possibility of un-intuitively guessing the outcome of a match is one third, therefore a

model that’s able to get more than a third of results correct could be considered successful.

Some people would consider looking at what the bookmakers predict as the most likely

outcome of a match as a reasonable way of making a prediction. This can be backed up the

by the data in the Mirror article “Backing the favourite: But in the Premier League do they

always win?” [1]. According to this article, the bookmaker’s favourite for Premier League

matches wins 54.7% of the time, a much better success rate than just guessing would

achieve. On the face of it, this does seem like a fairly high success rate, however it would

also be logical to suggest that a football fan with a good understanding of the Premier

League would also be able to guess results correctly more often than someone just

randomly guessing. We also need to consider that bookmaker’s odds aren’t there to help us.

Bookmakers will obviously need to make some sort of a profit so the odds that they give for

matches will have to reflect that. If there was a betting company that had a very successful

prediction model would they always base their odds on that? I think if they had a model that

predicted results correctly 80% of the time for example, then basing their odds on that would

have a negative effect on the amount of money they get. As soon as someone clocks on to

the fact that the result that the bookmakers say is most likely is almost always the correct

result, customers would put down a significant amount of money on the bookmakers

favourite, knowing that they will win almost all the time.

Another method of using statistics to predict the outcome of a match that has become

increasingly popular is comparing the expected goals that the two teams involved have. The

expected goals, a statistic created by the football statistician and blogger Michael Caley [2] is

the probability, based on historical averages, of a particular shot going in. This takes into

Page 2: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

many factors to determine the probability of a player or team scoring from a chance. These

include:

• Shot location e.g. angle, distance

• Shot type e.g. header, foot, free kick

• Assist type e.g. cross, through-ball, rebound, pass

• Speed of attack e.g. counter, possession, ground covered quickly or slowly, long ball

• Dribble from attacking player, beat the keeper or defender

• Set play or open

This statistic can then be manipulated to calculate the expected goals scored, and the

expected goals conceded for a particular team. The graph below from 11tegen11.net [3],

based on OPTA data, shows how different types of shot statistics correlate with future

performance over the course of the season.

From this we can see that from all the shot statistics, the expected goals ratio is the best

predictor for future performance.

However, this isn’t a perfect formula, there are anomalies. Two teams could be playing each

other, one with a high expected goals, and another with a low expected goals, with the team

that has a lower expected goals winning the game. This does happen but people do tend to

be shocked when anomalous results come up. Take for example when the best team in the

world at the time, Barcelona, lost 2-1 to a mediocre Celtic team.

Page 3: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

Even though the UEFA stats shown above [4] suggest Barcelona should have very

comfortably beat Celtic, they didn’t but the fact that the result was such a surprise provides

support for the expected goals model as a good predictor for matches’ outcomes. This

model does also miss out some vital pieces of information, such that the proximity of the

defender, the player taking the shot (when looking at a team rather than an individual) and

good chances that aren’t shots on goal.

Overall, for a model in its early stages it appears to generally be a good model for predicting

the outcomes of matches, player and team performances, and helps bookmakers remain the

devil.

The use of expected goals isn’t limited to making predictions though. Looking at expected

goals statistics can be a very good way of measuring the performance of either a team or a

player. A team or player regularly outscoring their expected goals could be seen as over

performing based on the shots on goal that they are getting and vice versa. There are

several benefits of knowing whether or not someone is under or over performing with

regards to their expected goals. Say for example you have a player that has scored six goals

in their last ten games. This is a more than respectable tally but just looking at the goal to

game ratio does not take into account how many shots they take or how easy the chances

they scored from were, so you can’t really deduce whether or not they are doing well, or if

they are just scoring all the really chances that they have had over that time. If we introduced

expected goals into this scenario, we would have a much better idea of whether this was a

player performing well or if they were just performing as well as any average player would

given the same chances.

An example that demonstrates how expected goals can be used to judge a player’s

performance would be to look at Gareth Bale in the 2012/2013 Premier League season. In

this season many said that if it weren’t Gareth Bale, Tottenham would have finished much

lower than they did. His performance in that season was so good that it earnt him the PFA

player of the year award and a transfer to Real Madrid, breaking the world record transfer

fee. Some people said that while he was good, he may not have been worth £75million but if

we take a closer look at his goal scoring performance in that season, we can see that he

was phenomenal. From the 166 chances that Bale had in that season, he scored twenty

goals, a very good total especially for a winger, but when you consider the quality of these

chances by looking at the expected goals, you can see that his performance was more

outstanding than meets the eye. The following graph from the blog post “Running a Simple

Simulation with Excel” by Mark Taylor [5], shows the percentage likelihood of achieving

different goal totals based on the expected goals from Bales chances.

Page 4: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

As the graph shows, the chance of scoring twenty goals from those chances is very low, so

from this we can deduce that Bale was significantly over performing and therefore regularly

scoring difficult chances, which emphasizes his quality that season.

Another way in which knowing whether a player is over or underperforming would be useful

is in a clubs scouting network. Scouts would be able to advise clubs about potential transfers

and give detailed information regarding their goal scoring performance which could lead to

clubs securing bargains in the transfer market and avoid overpaying for players who don’t

necessarily have a good goal scoring performance.

This isn’t the perfect statistic though, there is limitations to using expected goals to analyse

performance. As well as the limitations mentioned earlier, this statistic can only be used to

analyse attacking performance, for example you could have a defensive player who is

underperforming according to this method, however if goal scoring isn’t really their priority

anyway, are they really underperforming? Furthermore, to properly analyse player

performance using this method, it would have to be over a fairly long period of time because

it’s not unusual for a player to score lucky goals that you wouldn’t expect them to score and

these would have an influence on the perceived performance over short time period.

Obviously this isn’t the only statistic clubs will use, in fact clubs use such a huge range

statistic that the majority of Premier League clubs have a partnership with the football data

analytics company ProZone and in 2014 Arsenal invested over £2million in buying a similar

company, StatDNA, for their analytics [6]. At first some managers were unconvinced by the

use of data analytics. According to the Wired.com article “The winning formula: data

analytics has become the latest tool keeping football teams one step ahead” by Joao

Medeiros [7], in 2005, Southampton hired ProZone consultant, Simon Wilson to help provide

to give advice to the team. The manager at the time Harry Redknapp was not a fan. On one

occasion, Wilson gave a briefing to the team and the manager to try and help them before a

match, which subsequently, Southampton lost 3-2. On the team bus, Redknapp turned to

Wilson and said, "I'll tell you what, next week, why don't we get your computer to play

against their computer and see who wins?".

The same article [7] explains how there was several managers who did embrace data

analytics. One of the first major cases of this was with Sam Allardyce in 2000. At the time he

was managing a Bolton Wanderers, who were in the Championship (second division). Whilst

ProZone weren’t sure whether a team of Bolton’s stature would be able to afford them, they

thought that if it worked well for a club in Bolton’s position, it would give even better publicity

than if it were to work for a club like Arsenal or Manchester United that was already

successful. That season, Bolton were promoted to the Premier League, where Sam

Allardyce, based his game plan around statistics provided, for example, he insisted on his

players taking in-swinging corner kicks and crosses as a larger proportion of goals are

scored this way but also made sure to practice defending them in training. This led to Bolton

having a consistent record of top eight premier league finishes between 2003 and 2007 and

also led to them qualifying for the UEFA cup (now Europa League) for the first time in their

history.

Nowadays, the use of analytics is hugely important for football clubs but it doesn’t come

cheap, which raises the question: Is it fair that this in depth data is only available for clubs

that can afford it? On the one hand, you can’t expect companies such as ProZone and

StatDNA to give away data they’ve collected for free and you could argue that clubs are just

using their budget for the good of the team in the same way as buying players. On the other

hand, this could further widen the gap between big clubs and small clubs. It could become a

vicious circle for the less rich clubs. Clubs that can’t afford the best data analytics could be at

Page 5: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

a disadvantage, and if they struggle to progress because of this, they probably won’t have a

significant enough increase in revenue to be able to buy afford the analytics so will continue

to struggle.

All in all, while there is still some sceptics when it comes to the use of statistics in football,

the growing importance cannot be denied. If clubs like Arsenal are prepared to pay over

£2million to acquire data analytics companies, and the other Premier League clubs using

companies like ProZone, then it doesn’t look like it’s going to die down either. Whilst there

are issues related to the fairness of richer clubs being able to afford the best data analytics,

these clubs are just using the money they have available to them, and there are many bigger

issues related to money in football than the use of data analytics. Overall, I think that the

growing interest from fans coupled with the increasing use of big data in football, means that

use of statistics is only going to become more popular and as a mathematics student who is

passionate about football, I’m more than ok about it.

Page 6: (805344378) Discuss the Effectiveness and the Ethics of the Application of Data

References

1. Backing the favourite: But in the Premier League do they always win?

By David Dubas-Fisher

http://www.mirror.co.uk/sport/football/news/backing-favourite-premier-league-

always-4475089

Accessed: 04/04/2016

2. Let’s Talk about Expected Goals, By Michael Caley

http://cartilagefreecaptain.sbnation.com/2015/4/10/8381071/football-statistics-

expected-goals-michael-caley-deadspin

Accessed: 10/03/2016

3. The Best Predictor for Future Performance is Expected Goals

By “11tegen11”

http://11tegen11.net/2015/01/05/the-best-predictor-for-future-performance-is-

expected-goals/

Accessed: 05/05/2016

4. Celtic vs Barcelona 07/11/2012

http://www.uefa.com/uefachampionsleague/season=2013/matches/round=20

00347/match=2009541/postmatch/statistics/

Accessed: 06/04/2016

5. Running a simple simulation in Excel

By Mark Taylor

http://thepowerofgoals.blogspot.co.uk/2016/01/running-simple-simulation-

with-excel.html by

Accessed: 06/04/2016

6. Arsenal’s ‘secret’ signing: Club buys £2m revolutionary data company

By David Hytner

http://www.theguardian.com/football/2014/oct/17/arsenal-place-trust-arsene-

wenger-army-statdna-data-analysts

Accessed: 04/04/2016

7. The winning formula: data analytics has become the latest tool keeping

football teams one step ahead

By Joao Medeiros

http://www.wired.co.uk/magazine/archive/2014/01/features/the-winning-

formula

Accessed 03/04/2016