Predicting player movements in soccer using Deep Learning

27
Predicting player movements in soccer using Deep Learning A comparative study between LSTM and GRU on a real-life sports case Joris Verpalen ANR: 115394 Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Communication and Information Sciences, Master track Data Science Business & Governance, at the school of humanities and digital sciences of Tilburg University Thesis Committee: Prof. Eric Postma & Sebastian Olier May 13 th , 2019

Transcript of Predicting player movements in soccer using Deep Learning

Page 1: Predicting player movements in soccer using Deep Learning

Predicting player movements in soccer using Deep Learning

A comparative study between LSTM and GRU on a real-life sports case

Joris Verpalen

ANR: 115394

Thesis submitted in partial fulfillment

of the requirements for the degree of

Master of Science in Communication and Information Sciences,

Master track Data Science Business & Governance,

at the school of humanities and digital sciences

of Tilburg University

Thesis Committee:

Prof. Eric Postma & Sebastian Olier

May 13th, 2019

Page 2: Predicting player movements in soccer using Deep Learning

1. Preface I hereby present you my master thesis on predicting player movements in soccer using deep

learning. This study is performed in partial fulfillment of graduating for the Master of Science in

Communication and Information Sciences and, more specifically, for the master track Data Science

Business and Governance (DSBG) at Tilburg University.

I want to thank my supervisors Prof. Dr. Eric Postma and Sebastian Olier for their feedback, guidance

and support on writing this thesis. Additionally, I would like to thank A.C. Vu, a friend of mine who was

willing to help with cleaning and constructing the .csv-data used for this study. I also want to thank I.

Maat, a friend of mine who works as an app developer and advised me on the programming issues I

encountered during this process.

Page 3: Predicting player movements in soccer using Deep Learning

2. Abstract Analytics is becoming increasingly important in the domain of sports. The way analytics in sports

is performed has changed rapidly in recent years. This is mainly because of the availability of better

technology and due to the development of many applications in the field of computer science. One

way of analyzing sports and performance is through the prediction of players' positions and their

movement in the field. The most well-known application where predicting future movement is an

important aspect is within multi-object tracking. Recently, deep learning techniques are often applied

to do this. LSTM and GRU are mainly used for this, since these methods are well-known to deal with

long-term dependencies in a reliable fashion. However, these techniques have, to the best of my

knowledge, not yet been applied in a multi-object tracking setting in a real-life soccer case. Therefore,

this study aims to compare the performance of the LSTM and the GRU in a real-life soccer setting using

sensor data of the players' positions and pave the way for multi-object tracking using deep learning in

soccer. The research question of this study is as follows:

"To what extent does the use of LSTM and GRU contribute to the prediction of player movements in

soccer?"

In order to answer this research question, we used a data set that contains the x, y coordinates of

soccer players of a professional soccer club in Norway. after pre-processing the data 5 experiments are

performed to test the predictive ability of the LSTM and GRU in this setting. The results of the first two

experiments show that the data needs a more appropriate scaling in order to be suitable for learning

and prediction. Therefore, the data is modified to absolute changes in coordinates between two

measurements. Based on this data, four experiments are performed: 1) varying the number of

timestamps as input sequences, 2) doubling the time between timestamps, 3) test how far in the future

we can predict, and 4) predict the players’ trajectories of 40 timestamps into the future. The results

show that both the LSTM and the GRU are well capable of predicting the next change in the players'

positions. However, on longer input sequences, the predictive performance decreases. This can be due

to the depth of the models, which can be seen as a limitation of this study. However, on average the

LSTM and GRU perform equally well - meaning that they predict the future position of the soccer

players with a low error rate compared to the benchmark - and therefore, we can conclude that both

the LSTM and the GRU would be suitable deep learning techniques for predicting movement in a soccer

setting. This study only used sensor data of the soccer players to predict movement. In multi-object

tracking, however, usually visual data is taken as input. Therefore, the most important suggestion for

future research is to use video or image data as input to extract the features (i.e., soccer players) and

use this as input for prediction. In this way, a next step of applying deep learning techniques to multi-

object tracking in soccer can be realized.

Page 4: Predicting player movements in soccer using Deep Learning

Contents 1. Preface ............................................................................................................................................ 2

2. Abstract ........................................................................................................................................... 3

3. Predicting player movement in soccer ........................................................................................... 5

3.1 Research question ......................................................................................................................... 6

3.1.1 Goal ........................................................................................................................................ 6

3.1.2 Research question ................................................................................................................. 6

3.2 Overview study ............................................................................................................................. 6

3.3 Approach ....................................................................................................................................... 7

4. Related work ................................................................................................................................... 7

4.1 Predicting future movement and positions .................................................................................. 7

4.2 Predicting future movement and positions in sports ................................................................... 8

4.3 The use of deep learning to predict future movement and positions .......................................... 9

5. Methods ........................................................................................................................................ 10

5.1 Data set description .................................................................................................................... 10

5.2 Pre-processing data..................................................................................................................... 12

5.3 Algorithms and software ............................................................................................................. 13

6. Experimental set-up...................................................................................................................... 13

6.1 Model description ....................................................................................................................... 13

6.2 Hyperparameter settings ............................................................................................................ 14

6.3 Experiments to perform ......................................................................................................... 15

6.4 Evaluation metrics ....................................................................................................................... 16

6.4.1 Metrics ................................................................................................................................. 16

6.4.2 Benchmark ........................................................................................................................... 17

7. Results ........................................................................................................................................... 17

7.1 Experiment 1: Varying number of timestamps as input sequences ........................................... 17

7.2 Experiment 2: Doubling the time between timestamps ............................................................. 18

7.3 Experiment 3: Varying number of timestamps as input sequences when using absolute differences between coordinates ..................................................................................................... 18

7.4 Experiment 4: Doubling the time between timestamps when using absolute differences between coordinates ....................................................................................................................................... 19

7.5 Experiment 5: How far can we predict? ...................................................................................... 20

7.6 Experiment 6: Predicting longer trajectories .............................................................................. 21

8. Discussion & Limitations ............................................................................................................... 22

9. Conclusions ................................................................................................................................... 23

10. Future work ............................................................................................................................... 24

11. References ................................................................................................................................. 25

Page 5: Predicting player movements in soccer using Deep Learning

3. Predicting player movement in soccer The analysis of team and player performance in sports has changed rapidly in recent years. This is

mainly because of the availability of better technology and due to the development of many

applications in the field of computer science (Cust, Sweeting, Ball, & Robertson, 2019). In soccer the

demand for automated analysis has increased rapidly because it provides valuable information in at

least two ways. First, it provides information for managers and athletes in terms of individual or team

performance and for the development of proper tactics. Second, it can provide useful insights for

spectators in order to help them to better understand a soccer match (Kim, Moon, Lee, Nam, & Jung,

2018).

One way of analyzing sports and performance is through the prediction of players' positions and their

movements in the field. The most well-known application where predicting future movement is key is

in object tracking, which is a domain within computer vision. The goal of object tracking is to infer

trajectories of persons as they move around (Sadeghian, Alahi, & Savarese, 2017). This basically means

that the interest is in predicting the next movement and the accompanying new location of the object.

In object tracking, we can distinguish between the use of two types of data. First, visual data such as

videos or images are widely used in many applications in sports analytics. Second, IMUs (Inertial

Measurement Units), such as sensors are widely used as well in sports analytics. Here, athletes are

usually equipped with sensor belts in order to gather information about their position, speed, distance

covered, heart rate, and so on. So these sensors are able to collect information about many things such

as movement patterns in a reliable fashion (Camomilla, Bergamini, Fantozzi, & Vannozzi, 2018).

A development in analytics is that deep learning is being used more frequently. Using predictive

approaches with deep learning on IMUs or sensor data is a rapidly developing field of interest as well

(Xiang, Alahi, & Savarese, 2015). Examples of deep learning applications in multi-object tracking can

be found outside of the sports domain. In particular, predicting the movements of pedestrians, cars

and so on is widely studied (Kim, et al., 2017) (Alahi, et al., 2016). Especially predicting movement of

pedestrians has many similarities to a soccer setting. First, interest is in multiple objects (i.e., people)

who look very similar (e.g., due to matching outfits). Second, in both settings people are very close to

each other (Gade & Moeslund, 2018). Third, in tracking pedestrians as well as athletes, sudden changes

in motion appear very often which makes it harder to infer trajectories and predict their next position

(Ben Shitrit, Berclaz, Fleuret, & Fua, 2011).

In multi-object tracking results have shown that convolutional neural networks and recurrent neural

networks perform reasonably well (Ma, et al., 2018). Based on this, (Ma, et al., 2018), opted for the

use of GRUs (gated recurrent units) and found better results compared to other deep learning

Page 6: Predicting player movements in soccer using Deep Learning

methods. Other studies on multi-object tracking opt for the use of LSTM (long short-term memory) in

order to infer trajectories (Kim, et al., 2017) (Alahi, et al., 2016) (Xue, Huynh, & Reynolds, 2018). From

a theoretical point of view this makes sense because gated recurrent units and long short-term

memory are better able to deal with long-term dependencies as opposed to recurrent neural

networks. Within soccer the application of deep learning methods to predict movement and future

positions appears to be limited. Some examples can be found in other sports, such as basketball. Here,

(Shah & Romijnders, 2016) applied long short-term memory to model sequences of a basketball in the

field. Here, the model learns the trajectories of the basketball with only the coordinates of the ball as

input variables (Shah & Romijnders, 2016).

Since the application of deep learning methods to predict future movement and positions in soccer

seems to be relatively new, we will attempt to pave the way for this in a real-life soccer setting. In this

study we will compare the application of LSTM with the use of GRU to infer trajectories and predict

movement of multiple athletes. These methods have, to the best of my knowledge, not been applied

to a real-life soccer case. Therefore, it would be interesting to compare these models, which have

already proved themselves worthy in other settings, and see how they perform in a real-life soccer

setting.

In their paper, (Pettersen, et al., 2014) came up with a dataset that has sensor data (i.e., gps

coordinates of the soccer players' positions). Therefore, predicting movement and future positions of

all the soccer players using their sensor dataset could be an interesting application that would

contribute to the sports analytics domain and pave the way for visual multi-object tracking (i.e., using

video or image data) in a soccer setting.

3.1 Research question

3.1.1 Goal The goal of this study is to investigate whether the use of LSTM and GRU, which has been

applied in other settings before, can contribute to the prediction of future movements of soccer

players. Therefore, the research question of this study can be defined as follows:

3.1.2 Research question Research question: “To what extent does the use of long short-term memory and gated recurrent unit

contribute to the prediction of player movements in soccer?”

3.2 Overview study The rest of this study will be organized as follows: In chapter 4, an overview of the related work

is presented. Here, an extensive review of the relevant literature is presented. In chapter 5, we start

Page 7: Predicting player movements in soccer using Deep Learning

by describing the data set used for this study. Additionally, a brief explanation of the data pre-

processing is given and, moreover, the algorithms and software used for this study are described. Next,

chapter 6 presents the experimental set-up and the metrics used for evaluating the performance of

the methods. Subsequently, the results of the performed experiments are presented, compared and

visualized in chapter 7 and these results will be discussed in chapter 8. Here, the limitations of this

study will be discussed as well. In chapter 9 the conclusions that can be drawn from this study are

elaborated on and, additionally, recommendations for future work will be given in chapter 10. Finally,

an overview of the references used in this study are presented in chapter 11.

3.3 Approach The approach for answering the research question of this study starts by delving further into to

the work that has already been done on this topic first. This will give us a good idea of what has been

on this topic already and how we can embed our study in the literature and contribute to the further

development of analytics in sports using deep learning. Then we will explore our dataset and, if

needed, modify the data to make it appropriate to use for this study. When the data is prepared and

ready for use we will evaluate the available code from GitHub, Deep Learning lectures at Tilburg

University and other sources for constructing the LSTM and Gated Recurrent Unit models. When the

models are constructed we will use our dataset on these models and evaluate their performance by

comparing them to each other. This allows us to create a base-case for predicting players’ movements

using deep learning in soccer. The results of this will provide us with the input to draw conclusions,

elaborate on limitations, and provide the reader with some suggestions for future work.

4. Related work This chapter gives an overview of the related work that already has been done within the

domain of this study. We will discuss the following three things: First, we define the domain of

predicting future movements. Second, we make a link to sports and elaborate on the challenges that

are associated with predicting future movements in sports. Third, we conclude this chapter by

elaborating on the trend of applying deep learning techniques to predict movement and future

positions and what work has been done in and outside the domain of sports.

4.1 Predicting future movement and positions The problem of tracking moving targets or predicting future movements is a field that is

actively studied nowadays. Applications can be found in autonomous robots such as self-driving cars

where the importance is in predicting the future trajectories of other objects or humans in order to

avoid collisions and so on (Nikhil & Tran Morris, 2018). This problem of trajectory prediction can be

Page 8: Predicting player movements in soccer using Deep Learning

viewed as a sequence generation task, where the main interest is in predicting the future position at

different time-instances of people or objects based on their past positions (Alahi, et al., 2016). The

most widely recognized field where predicting future movement is the main interest is in object

tracking, a domain in the field of computer vision. The goal of object tracking is to infer trajectories of

objects or persons as they move around (Sadeghian, Alahi, & Savarese, 2017). The basic idea here is to

predict the next movement and the accompanying new position or location of the object. Especially

multi-object tracking is a challenging task since it involves multiple objects or persons to be tracked

instead of just one (Kim, Moon, Lee, Nam, & Jung, 2018).

From the literature we can derive that usually two types of data are used for object tracking. First,

visual data such as videos or images. This is the most used data type for this purpose. Second, IMUs

(Inertial Measurement Units), such as sensors are sometimes used as well. Here, the objects are

equipped with tools in order to gather information about their position, speed, distance covered, and

many more. With these tools we are able to collect information about many things such as movement

patterns in a reliable fashion (Camomilla, Bergamini, Fantozzi, & Vannozzi, 2018). Considering the fact

that the development of algorithms to model sensory inputs and infer predictions based on that is still

an unsolved problem (Felsen, Agrawal, & Malik, 2017), it would be an interesting area for further study.

In the next paragraph, we will narrow this problem down to the sports analytics domain.

4.2 Predicting future movement and positions in sports In many team sports one team tries to score a goal and the defending team constantly tries to

estimate the next move of the attacking team in order to prevent them from scoring and vice versa.

This can be explained as the human activity to make predictions or inferences about the future and act

accordingly based on these inference (human intelligence). Automated analysis of player movements

can help managers and athletes to be better able to make tactical decisions. This will benefit individual

as well as team performance. The analysis of team and player performance in sports has changed

rapidly in recent years. This is mainly because of the availability of better technology and due to the

development of many applications in the field of computer science (Cust, Sweeting, Ball, & Robertson,

2019).

One of the tools for analyzing sports are automatic tracking systems. With the use of player tracking,

valuable information can be discovered about a player or team's movement during a match (Lara,

Vieira, Misuta, Moura, & Barros, 2018). Tracking systems where predictions are made about players'

or the ball's trajectories are studied for many sports, including American football (Lee & Kitani, 2016),

basketball (Shah & Romijnders, 2016) (Zhao, Yang, Chevalier, Shah, & Romeijnders, 2018), soccer

(Barber & Carré, 2010), and table tennis (Zhang, Xu, & Tan, 2010). Object tracking in team sports are

particularly difficult. For example, in soccer multi-object tracking (i.e., tracking multiple players at the

Page 9: Predicting player movements in soccer using Deep Learning

same time) is particularly challenging due to the number of objects to be tracked (Pettersen, et al.,

2014). Additionally, soccer is a dynamic sport where abrupt changes in a player's motion occurs

frequently, just as there are many cases where players are very close to each other which makes it

harder to determine which trajectory belongs to which player (Kim, Moon, Lee, Nam, & Jung, 2018).

4.3 The use of deep learning to predict future movement and positions Applying predictive approaches such as multi-object tracking with deep learning is a rapidly

developing field of interest (Xiang, Alahi, & Savarese, 2015). Therefore, we will now discuss some of

the applications where deep learning is being applied within the domain of predicting future

movement and positions. We also look into the applications of deep learning methods for multi-object

tracking within sports.

Examples of deep learning applications to multi-object tracking can be found in multiple domains,

including predicting the movements of pedestrians, cars, and so on is widely studied (Kim, et al., 2017)

(Alahi, et al., 2016). Especially inferring trajectories of pedestrians has many similarities to team sports

such as soccer. One reason for this is that in both cases, interest is in tracking multiple objects who

look very similar in a crowded space. Second, in both settings, the objects to be tracked are very close

to each other (Gade & Moeslund, 2018). Third, in tracking pedestrians as well as athletes like soccer

players, sudden changes in motion occur frequently which makes it harder to infer trajectories and

predict the future positions (Ben Shitrit, Berclaz, Fleuret, & Fua, 2011).

In multi-object tracking, results have shown that convolutional neural networks and recurrent neural

networks perform reasonably well. However, (Ma, et al., 2018) opted for the use of GRUs and found

better results compared to other deep learning methods. Other studies that focus on multi-object

tracking opt for the use of LSTM in order to infer trajectories and predict future movement and

positions (Kim, et al., 2017) (Alahi, et al., 2016) (Xue, Huynh, & Reynolds, 2018). A better performance

when using GRUs as well as LSTMs compared to RNNs or CNNs make sense from a theoretical point of

view because GRUs and LSTMs are better capable of dealing with long-term dependencies as opposed

to recurrent neural networks, for example. Multi-object tracking using deep learning has been applied

in many sports domains already. In basketball for example, where researchers used LSTM to infer

trajectories of the basketball itself with only the coordinates of the basketball as input variables (Shah

& Romijnders, 2016) (Zhao, Yang, Chevalier, Shah, & Romeijnders, 2018). Another example is in water

polo where (Felsen, Agrawal, & Malik, 2017) used RNN's to track water polo players. Also in volleyball,

trajectory prediction of the volleyball is performed using neural networks (Suda, Makino, & Shinoda,

2019).

However, the use of LSTMs and GRUs for multi-object tracking in a sports setting, and especially in

soccer, appears to be limited thus far. Since the application of these deep learning methods to predict

Page 10: Predicting player movements in soccer using Deep Learning

future movement and positions in a soccer setting seems to be relatively new, this study attempts to

pave the way for this in a real-life soccer setting case. These methods have, to the best of my

knowledge, not been performed in such a setting using only sensor data of the players' positions in the

field. Using LSTM and GRU on only sensor data could be a first step in multi-object tracking using these

methods in a real-life soccer setting.

5. Methods In this section, we turn to describing our research method. We start by giving a description of the

data set that is used for this study. Furthermore, we explain how this data is pre-processed in order to

make it suitable for the experiments conducted within this study and the accompanying research goal.

We conclude this chapter by briefly explaining the algorithms and software used for this study. A more

in-depth explanation of the algorithms and software with the accompanying experiments will be given

in chapter 6.

5.1 Data set description The dataset used for this study is a player positions dataset (Pettersen, et al., 2014) of

professional soccer players. The data is gathered at the Alfheim Stadium, which is the stadium of a

professional soccer club in Norway, Tromsø IL. An overview of the size of the pitch can be found in

figure 1. The data set contains body sensor data of the players of Tromsø IL. This data contains

information about their position on the field, using Cartesian coordinates1, speed, acceleration, sprint

distance, heading, direction, energy consumption, total distance covered, a unique player ID

(anonymized) and a timestamp. This data is gathered with a ZXY Sports Tracking System. The data is

captured every 5 milliseconds and the data gathered is stored in csv-files. Additionally, the tag-ids of

1 Cartesian coordinates specify each object (i.e., soccer player) by a set of numerical values, which are

the shortest distances from the object to the reference point.

Figure 1: An overview of the pitch at Alfheim Stadium,

Norway (Pettersen, et al., 2014). From this we can derive

that x, y coordinates are between (0,0) and (105, 68).

Page 11: Predicting player movements in soccer using Deep Learning

the players have been randomized in order to anonymize the individual soccer players. An example of

the raw sensor data can be found in figure 2.

Timestamp ID X_pos Y_pos Heading Direction Energy Speed Total_distance

7-11-2013

21:05

2 35.2178 30.15277 1.82346 -1.7425 142.9266 0.3286 203.4843

7-11-2013

21:05

6 47.1549 19.9495 2.1015 2.0128 243.2256 0.5106 278.1391

7-11-2013

21:05

7 41.7908 49.8916 1.9047 -1.6392 155.0570 0.5365 380.0208

7-11-2013

21:05

8 53.2812 41.9960 2.6029 -3.0881 165.0328 1.0327 287.2322

7-11-2013

21:05

10 39.4537 28.8088 1.2822 2.7842 330.9677 1.4894 333.4538

Figure 2: An example of the raw ZXY sensor data, where the columns describe the following from left to right: The timestamp

of the data gathered, the anonymized player ID, the position in meters on the x-axis of the field, the position in meters on the

y-axis of the field, the heading and direction where the player is going, the energy level, the speed and the total distance

covered so far

The ZXY data are capture every 5 milliseconds. For example, the first measurement is taken at 2013-

11-03 18:01:09, whereas the second measurement takes place at 2013-11-03 18:01:09:05. However,

it occurs relatively often that not all players’ coordinates are recorded every 5 milliseconds and, as a

consequence, there is a difference in the amount of coordinates available per player. Because the goal

of this study is to predict the future movement and positions of all 11 soccer players, not all data is

useful and some pre-processing actions are needed to make this data set useful for the experiments

conducted within this study. These pre-processing actions will be discussed in the next paragraph.

Player ID 1 2 5 7 8 9 10 12 13 14

# Coordinates 44.555 56.436 56.242 56.273 56.478 55.399 56.505 47.478 56.156 56.499

Figure 3: An example of the anonymized player ID's and the number of x,y coordinates available. From here, we can

conclude that not all players have the same number of coordinates available, which means that not all data is useful for this

study.

Page 12: Predicting player movements in soccer using Deep Learning

5.2 Pre-processing data The ZXY data – which comprises the actual positions of the soccer players in the field – is

defined in x, y coordinates, where x represents the position in meters on the x-axis of the field and y

represents the position in meters on the y-axis of the field (see figure 1). As described in section 5.1,

the ZXY sensor data contains anonymized player IDs which enables us to predict the player IDs’

positions based on their x, y coordinates. However, not all player IDs contain the same number of

coordinates (see figure 3). This is because it occurs relatively often that not all players’ coordinates are

recorded every 5 milliseconds and, consequently, there is a difference in the amount of coordinates

available per player. Since the goal of this study is to predict the future positions of all eleven players

at the same time only a part of the data is useful for this study. Additionally, this study intends to use

coordinates from at most 100 subsequent timestamps of all 11 players as input for the model and – as

output - predict the coordinates (i.e., the players’ positions) at the next timestamp. Therefore, we only

extracted the timestamps that have coordinates of all 11 players and can complete a sequence of 100

subsequent timestamps. The selection of these timestamps and accompanying player IDs is performed

in Excel. First the data of the player IDs which are not useful (e.g., substitutes) are deleted. Secondly,

all timestamps are given a timestamp ID (i.e., timestamp 0.00.05 = 1, timestamp 0.00.10 = 2, and so

on).

Time X player

1

Y player

1

X player

2

Y player

2

X player

3

Y player

3

X player

4

Y player

4

1 26.60 29.40 35.61 30.32 41.61 38.79 28.52 39.67

2 26.66 29.51 35.53 30.35 41.69 38.77 28.58 39.61

3 26.62 29.53 35.56 30.39 41.63 38.71 28.65 39.57

4 26.70 29.59 35.59 30.47 41.57 38.63 28.67 39.52

5 26.73 29.52 35.71 30.45 41.50 38.61 28.69 39.52

After cleaning the data we have the remaining coordinates of the 11 player IDs that we need. Next, we

copied the timestamp IDs and deleted all duplicates in order to have a list of unique timestamp IDs.

Using COUNTIF-statements, the results show a numerical value of all unique players per timestamp.

All timestamps that do not have the coordinates of all the 11 player IDs are deleted. Finally, the data

is transposed so that each row contains a timestamp and the x, y-coordinates of all eleven player IDs.

For an example of the data after the pre-processing stage, see figure 4.

Due to the large amount of data, the data pre-processing is executed at an offsite server, accessed via

a virtual machine running on an Octacore Intel Xeon CPU @ 3.33 Ghz with 16 GB RAM. After pre-

Figure 4: An overview of a part of the data set after pre-processing it in Excel. Here we can see the timestamp at which the

data are gathered by the sensors. In column 2 to 8 we can see the x-coordinate and the y-coordinate in meters of player 1 to

4.

Page 13: Predicting player movements in soccer using Deep Learning

processing the data, what remains is the data set that serves as input for the model. The data consists

of 56.808 timestamps with x, y coordinates of eleven players. From these timestamps, multiple input

sequences are generated in Python. For example, for the setting in which we will be using 5 timestamps

as input we can compute more sequences than we have timestamps. This works as follows: compute

a sequence of timestamp 1, 2, 3, 4, 5. Then the second sequence that is computed is 2, 3 ,4 ,5, 6.

Therefore, this dataset allows us to create more sequences than only the timestamps that we have

available. This is also true when we experiment with longer sequences (i.e., 20, 50, and 100 timestamps

respectively). The sequence-generating process is executed in Python.

Additionally, this study intends to experiment with better scaled data as well. Reason for this is the

frequency by which coordinates are generated (i.e., every 5 milliseconds). Because of this, we expect

little difference in coordinates between two subsequent timestamps. This may have a negative impact

on the ability of our models to learn trajectories. Therefore, we additionally make a copy of the pre-

processed data and convert the coordinates into absolute differences in coordinates between

subsequent timestamps. For example, when an x, y coordinate at timestamp 1 is (48, 29) and at

timestamp 2 this is (47, 32), the new coordinate of absolute difference becomes -1 for the x-coordinate

and +3 for the y-coordinate.

5.3 Algorithms and software The goal of this study is to test the predictive ability of Long Short-Term Memory (LSTM) and

Gated Recurrent Unit (GRU) on a real-life soccer setting and pave the way for predicting future

movements and positions in the sports domain using deep learning. Therefore, the development of

the LSTM and GRU are the most important algorithms used for this study. These models are built in

Python, supported by the use of Keras and Tensorflow. In the next chapter we will further elaborate

on the models and the algorithms used for this study.

6. Experimental set-up In this section we describe the models that we propose for this study and the tasks that the model

should perform. Additionally, we present the algorithms that we use as well as the parameters of the

models. Finally, the metrics used for evaluating and comparing the performance of both the models

are described and motivated.

6.1 Model description Long Short-Term Memory (LSTM) is a modification of the RNN that are able to learn long-term

dependencies. The LSTM is invented by (Hochreiter & Schmidhuber, 1997) and its main goal is to

Page 14: Predicting player movements in soccer using Deep Learning

remember information for longer time periods. A gated recurrent unit, invented by (Cho, et al., 2014),

is an improved model of the recurrent neural network as well and is quite similar to long-short term

memory (LSTM). Gated recurrent units can be trained to pertain information from the past without

deleting information that is not relevant for the prediction. This works as follows: It has a reset gate

(to determine how the new input should be combined with the memory) and an update gate (which is

responsible for determining the amount of previous information to keep). A gated recurrent unit’s aim

is to learn long-term dependencies. The main difference between LSTM and GRU is that a GRU has

gating units just like the LSTM. However, the GRU has gating units that regulate the information flow

within the unit without having separate memory cells that LSTM’s encompass (Chung, Gulcehre, Cho,

& Bengio, 2014). For an overview of the LSTM and GRU, see figure 5.

The task of both models, LSTM and GRU, in this setting is to predict the x, y coordinates of all eleven

soccer players at timestamp t, based on n input sequences. For example, 5 sequences of x, y

coordinates at time t1, t2, t3, t4, t5 are used to predict the x, y coordinates at time t6. But more

variations can be thought of and will be experimented with. This will be further described in paragraph

6.3. First, we will discuss the hyperparameters used for our models in the next paragraph.

6.2 Hyperparameter settings The (hyper)parameters of interest for this study consist of the train-test set ratio, the number

of input sequences (i.e., how many timestamps are used as input for prediction), the number of nodes

Figure 5: An overview of the LSTM and GRU

Page 15: Predicting player movements in soccer using Deep Learning

per layer, the number of hidden layers and the learning rate. We will now discuss the selection of these

(hyper)parameters.

The train-test set ratio is set at 80 - 20, since the study of (Ma, et al., 2018) also used this ratio.

Moreover, this ratio is often chosen in other studies on object tracking as well. The number of

sequences used as input for the model will be varied in order to see how the model performs when

increasing the number of sequences. However, the dataset consists of 56.808 timestamps and,

consequently, one must keep in mind not to generate sequences that are too long so that the training-

set becomes too low for training the model proper training. Therefore, we will run the model on

sequences with 5, 20, 50, and 100 timestamps as input. The number of hidden layers will be set at 4,

since (Ma, et al., 2018) showed that this number of layers performs best. The number of nodes per

hidden layer is set at 22. Additionally, the learning rate will be set at l = 0.001 with rectified linear unit

as activation function and the mean squared error as a loss function. In the next paragraph we will

describe the experiments that we will perform for this study.

6.3 Experiments to perform The experiments that will be performed for this study are six-fold. We start by using a naïve

approach. As explained in the pre-processing section (paragraph 5.2), the player coordinates are

measured every 5 milliseconds, which means that we expect little differences in coordinates between

two subsequent timestamps. Therefore, we can expect our models to behave poorly when using these

coordinates as input data. We start by assessing the predictive performance when using these

coordinates. Additionally, we vary the number of timestamps as input sequences. This means that we

will assess the predictive performance as follows: predict the x, y coordinates at timestamp t, based

on 5, 20, 50, or 100 input sequences. For example, 5 sequences of x, y coordinates at time t1, t2, t3,

t4, t5 are used to predict the x, y coordinates of all the eleven soccer players at time t6.

Next, we test the performance of both the LSTM and the GRU by varying number of timestamps as

input sequences, just like in the previous experiment (i.e., number of timestamps as input sequences

= 5, 20, 50, and 100). Additionally, this time the length between timestamps will be doubled. This is

because the measurement of timestamps occurs frequently, which means that there is not much time

between two timestamps. This makes it easier for both the LSTM and the GRU to predict, because

movements are limited. Also, we may expect that the model just uses the last input coordinates as

prediction-values because of the small time between timestamps. By doubling the time between two

timestamps (i.e., by deleting every second timestamp) the models may have more challenges to

predict the movement and the next positions of all the 11 soccer players.

Page 16: Predicting player movements in soccer using Deep Learning

In the third experiment we opt for a less naïve approach by creating a better scaling of the x, y

coordinates. So, for the third experiment we will modify the coordinates to the absolute changes in

coordinates in order to predict the change in the next coordinates. For example, an x, y coordinate at

timestamp t1 = [x = 25, y = 37] and an x, y coordinate at timestamp t2 = [x = 26, y = 36] will result in

an absolute change coordinate of x = +1 and y = - 1. The reason for this is to get a better scaling: predict

change (i.e., movement) instead of predict exact coordinates. These coordinates of absolute change

will be used to assess the predictive performance when using input sequences of 5, 20, 50, and 100

timestamps again.

The fourth experiment is a combination of experiment 2 and 3: assess the predictive performance of

both the GRU and the LSTM when doubling the time between measurements and using the absolute

differences between coordinates as input. Again, this will be conducted using input sequences of

length 5, 20, 50, and 100.

Fifth, we test the predictive performance of both models by evaluating how far these models can

predict into the future. This works as follows: We start by using 5 timestamps as input to predict the

x, y coordinates at a next timestamp further in time. For example, 5 sequences of x, y coordinates at

time t1, t2, t3, t4, and t5 are used to predict the x, y coordinates at time t10, t20, t50, or t100. Again,

this will be performed with the absolute differences between the timestamps which are described in

experiment 3.

Finally, we construct a predicted trajectory of all eleven soccer players. Basically, this means that we

use 5 timestamps as input sequences in order to predict the coordinates at the sixth timestamp: t1, t2,

t3, t4, t5 → predicted t6. Then, we remove t1 and add the predicted t6 to predict t7: t2, t3, t4, t5,

predicted t6 → t7. We repeat this process to construct a trajectory of 40 timestamps.

We now turn to describing our evaluation metrics used to assess the predictive performance of both

our models in all experiments.

6.4 Evaluation metrics

6.4.1 Metrics Because the prediction of future movements and positions in a soccer setting using the LSTM

and GRU on the players’ coordinates only is relatively new, the performance of both models will be

compared to each other in order to determine which model performs best in a real-life soccer setting

Page 17: Predicting player movements in soccer using Deep Learning

based on x, y-coordinates. The metrics used to evaluate the models’ performance are the following:

MSE, RMSE, and MAE (see figure 6 for the formulas of these error metrics).

These metrics are chosen based on other studies that predict future movements or positions in other

settings. Since the models aim to predict coordinates or the difference in coordinates (movement),

interest in lowering the error metrics. Therefore, MAE, MSE and RMSE are suitable metrics for

evaluation in this setting.

6.4.2 Benchmark In addition to the evaluation metrics described in the previous paragraph, we will compare the

performance of our models against a benchmark which can be seen as the base-case. Because the

timestamps are close to each other – which means that we get player coordinates every 5 milliseconds

– we may expect only very small changes in coordinates between timestamps. Therefore, we compare

our predictions to the last input coordinates and see if there are any deviations. For the experiments

where we will be using the differences in player coordinates between timestamps as input (i.e.,

experiment 1, 2, and 3) this means that we compare the performance of our models against a base-

case of no change.

7. Results In this section, the results of the experiments described in the previous section will be presented.

We will compare the predictive performance of both models – LSTM and GRU – to each other using

the metrics defined earlier. The results will be described per experiment.

7.1 Experiment 1: Varying number of timestamps as input sequences The first experiment was to assess the predictive performance of both models when using the

players’ coordinates of a varying number of timestamps as input sequences to predict the coordinates

of the soccer players for the next timestamp. We assessed the predictive performance of the LSTM

and the GRU with the number of timestamps used as input set at 5, 20, 50, and 100, respectively. The

Figure 6: Evaluation metrics for assessing model performance

Page 18: Predicting player movements in soccer using Deep Learning

results of these experiments show good results at first sight. However, when comparing the results to

the base-case described in section 6.4, we see that the results are exactly the same. This means that

the model is using the last input coordinates as a prediction, which can be explained by the very small

changes in coordinates between the timestamps. This is due to the fact that the timestamps are

constructed relatively frequent (i.e., every 5 milliseconds) and that, as a consequence, the change

between coordinates is so small that the model learns to use the last input coordinates as predictions.

7.2 Experiment 2: Doubling the time between timestamps For the second experiment, we vary the number of timestamps with the players’ coordinates

as input in the same way as we did for the first experiment (i.e., 5, 20, 50, 100). Additionally, we

doubled the time between the timestamps by using sequences with more difference in time. For

example, when using 5 timestamps as input, we can use a sequence of timestamps like t1, t3, t5, t7,

and t9 to predict the coordinates of the 11 players at t10.

Again, these results are compared against the base-case (i.e., using the last input coordinates as

prediction). Just like in experiment 1, it can be concluded that by doubling the time between

timestamps does not lead to a model that is learning to predict compared to the results of the

benchmark. Instead, the models learn to return the last input coordinates since these give the lowest

error metrics. Therefore, both experiment 1 and 2 show that the scaling of the data is not suitable for

the learning ability of the models and as a consequence, for the rest of the experiments the data is

modified to absolute changes in coordinates between two timestamps.

7.3 Experiment 3: Varying number of timestamps as input sequences when using absolute differences between coordinates

For this experiment we will modify the coordinates to the absolute changes in coordinates in

order to predict the change in the next coordinates. For example, an x, y coordinate at timestamp t1 =

[25.8, 37.7] and an x, y coordinate at timestamp t2 = [25.9, 37.4] will result in an absolute change

coordinate of x = +1 and y = -3. These coordinates are then used to assess the predictive performance

when using input sequences of 5, 20, 50, and 100 timestamps just like in experiment 1 and 2. The

results show a decrease on all relative error metrics compared to the benchmark (see figure 7). This

can be explained by the better scaling of all player coordinates, which definitely benefits the model

compared to the previous two experiments. To explain how to interpret the results we will give an

example of the LSTM-5, where 5 input sequences where used to predict t6 using the LSTM model (see

the first row of figure 7). Here, we can see that the mean absolute error, MAE, = 0.6080. This should

be interpreted as follows: when the model predicts +1 on the x-coordinate, it predicts that the player’s

Page 19: Predicting player movements in soccer using Deep Learning

x-coordinate will change with 10 centimeters (e.g., from 24.8 to 24.9). Consequently, the absolute

error in centimeters is: 0.6080 * 10 = 6.080 centimeters.

5 timestamps as input MAE MSE RMSE

GRU 0.6101 2.8702 1.6942

LSTM 0.6080 2.8765 1.6960

Baseline 0.9243 6.745 2.5971

20 timestamps as input MAE MSE RMSE

GRU 0.5999 2.8996 1.7028

LSTM 0.5987 2.8553 1.6898

Baseline 0.9194 6.6693 2.5825

50 timestamps as input MAE MSE RMSE

GRU 0.6074 2.9366 1.7137

LSTM 0.6042 2.9094 1.7057

Baseline 0.9244 6.7081 2.5900

100 timestamps as input MAE MSE RMSE

GRU 0.6184 2.9503 1.7176

LSTM 0.6140 2.9428 1.7155

Baseline 0.9293 6.7198 2.5923

Figure 7: The results of experiment 3. From top to bottom: the results of the experiment using 5 timestamps as input, 20

timestamps as input, 50 timestamps as input, and 100 timestamps as input. From these results, we can derive that the LSTM

performs slightly better, since they show lower error metrics

7.4 Experiment 4: Doubling the time between timestamps when using absolute differences between coordinates

The fourth experiment is a combination of experiment 2 and experiment 3: assess the

predictive performance of both the GRU and the LSTM when doubling the time between

measurements and using the absolute differences between coordinates as input. Again, this

experiment is conducted using input sequences of length 5, 20, 50, and 100.

The results, depicted in figure 8, again show a decrease in the error metrics compared to the

benchmark. This can be explained by the better scaling of the coordinates compared to experiment 1

and 2. Additionally, increasing the time between two measurements of coordinates ensures that there

are bigger differences in coordinates between timestamps. This benefits the models in their predictive

performance. What is noticeable is the fact that on the longer input sequences (i.e., 50 and 100), the

model still shows a good predictive performance compared to our benchmark. However, the error

Page 20: Predicting player movements in soccer using Deep Learning

metrics are getting closer to the benchmark results. This may be due to the relative big amount of

input data which the models find hard to derive structure from in a setting with just 4 hidden layers

and 22 nodes per hidden layer.

5 timestamps as input MAE MSE RMSE

GRU 0.8349 3.9281 1.9819

LSTM 0.8374 3.9475 1.9868

Baseline 1.3178 10.8778 3.2982

20 timestamps as input MAE MSE RMSE

GRU 0.8634 4.2253 2.0555

LSTM 0.8470 4.0884 2.0220

Baseline 1.3156 10.8481 3.2936

50 timestamps as input MAE MSE RMSE

GRU 0.8725 4.3106 2.0762

LSTM 0.8605 4.1748 2.0432

Baseline 1.3211 10.9151 3.3038

100 timestamps as input MAE MSE RMSE

GRU 0.8811 4.3531 2.0864

LSTM 0.8698 4.2200 2.0543

Baseline 1.3293 10.9828 3.3140

Figure 8: The results of experiment 4. From top to bottom: the results of the experiment using 5 timestamps as input, 20

timestamps as input, 50 timestamps as input, and 100 timestamps as input. On average, we can see that the LSTM models

perform slightly better, except for in the first case

7.5 Experiment 5: How far can we predict? Testing how far we can predict works as follows: We start by using 5 timestamps as input to

predict the x, y coordinates at the next timestamp. For example, 5 sequences of x, y coordinates at

time t1, t2, t3, t4, and t5 are used to predict the x, y coordinates at time t10, t20, t50, or t100. Again,

this will be performed with the absolute differences between the timestamps which are described in

experiment 3. Figure 9 shows the results of this experiment. It shows that both LSTM and GRU perform

approximately equally well. On top of that, for all predictions both models outperform the benchmark

significantly. What is notable is the fact that both the LSTM and the GRU seem to perform only slightly

worse when predicting further in time (for example, comparing the performance of both the GRU and

the LSTM on predicting t+10 and t+100). Comparing this to the previous experiment, it is possible that

the models were not deep enough for the previous experiment in order to deal with the big amounts

of data.

Page 21: Predicting player movements in soccer using Deep Learning

t = + 10 MAE MSE RMSE

GRU 0.6203 2.9065 1.7048

LSTM 0.6204 2.9085 1.7054

Baseline 0.9210 6.7297 2.5942

t = + 20 MAE MSE RMSE

GRU 0.6248 2.8738 1.6952

LSTM 0.6257 2.8850 1.6985

Baseline 0.9306 6.7886 2.6055

t = + 50 MAE MSE RMSE

GRU 0.6226 2.8863 1.6989

LSTM 0.6180 2.8772 1.6962

Baseline 0.9181 6.6564 2.5800

t = + 100 MAE MSE RMSE

GRU 0.6187 2.8522 1.6888

LSTM 0.6157 2.8576 1.6904

Baseline 0.9172 6.6168 2.5723

Figure 9: Results experiment 5: Based on 5 timestamps as input sequences we predict (from top to bottom): the 5 + 10 = 11th

timestamp, the 5 + 20 = 25th timestamp, the 5 + 50 = 55th timestamp, and the 5 + 100 = 105th timestamp. We can see that the

GRU performs slightly better in the first two scenario’s whereas the LSTM performs slightly better in the last two scenario’s

7.6 Experiment 6: Predicting longer trajectories In this experiment we constructed a trajectory of all eleven soccer players. Basically, this means

that we use 5 timestamps as input sequences in order to predict the coordinates at the sixth

timestamp: t1, t2, t3, t4, t5 → predicted t6. Then, we remove t1 and add the predicted t6 to predict

t7: t2, t3, t4, t5, predicted t6 → t7. We repeat this process until we construct a trajectory of 40

timestamps. The results, depicted in figure 10, show that the GRU performs only slightly better than

the LSTM. However, the differences are extremely small which means that both the GRU and LSTM

could be suitable models for trajectory prediction. However, it should be noted that the error metrics

increase significantly as compared to the other experiments. This makes intuitive sense, because

predictions are used as input for making new predictions in this experiment. By using predictions that

deviate from the ground-truth as input for making new predictions, it makes sense that the model will

return wrong predictions again.

Trajectory prediction

MAE MSE RMSE

GRU 0.1988 0.9682 0.983951

LSTM 0.1999 0.9691 0.984426

Page 22: Predicting player movements in soccer using Deep Learning

Baseline 0.3750 1.9318 1.389899 Figure 10: Results experiment 6: The GRU performs slightly better in terms of trajectory prediction as compared to the

benchmark.

8. Discussion & Limitations From the experiments performed and the results of the predictive performance of both the LSTM

and the GRU show multiple things. First, when comparing experiment 1 and 2 to experiment 3, 4, and

5 we can see that modifying the players’ coordinates to the absolute difference between coordinates

at different timestamps benefits the model. From experiment 1 and 2, the results show that the models

basically learn to return the last input coordinates as predictions. This means that the models are not

learning anything. However, this was expected because the time between the measurement of the

coordinates is extremely small (i.e., 5 milliseconds). Therefore, the changes in coordinates between

timestamps are extremely small and consequently, when ‘predicting’ the same coordinates as the last

input coordinates, the model predicts that there is no change in coordinates and this results in the

lowest error metrics.

In order to ensure that both the LSTM and the GRU are capable of learning we therefore modified the

coordinates to the absolute changes between two coordinates (e.g., t1 = (98.2 ; 47.3), t2 = (89.1 ; 47.5)

is converted to x = -1 and y = +2. Then, we let the models predict the change in coordinates for the

next timestamp. The results show that both models learn well and perform better than the benchmark,

but the LSTM performs slightly better on almost all experiments resulting in lower error metrics. For

example, the LSTM with 5 timestamps as input sequences to predict t6 (see the first row of figure 8)

shows in an MAE of 0.6080. This means that we can predict with a deviation of only a few centimeters

from the ground-truth. This is because of the better scaling of the coordinates. Additionally, doubling

the time between coordinates (i.e., using t1, t3, t5, t7, and t9 instead of t1, t2, t3, t4, and t5 for

prediction) shows an even better predictive performance. Though the LSTM as well as the GRU

perform better than the benchmark on all input sizes, a downside from the results of experiment 3

and 4 is that the results of the longer input sequences (i.e., 50 and 100 timestamps as input) performs

slightly worse compared to the shorter input sequences. It seems that the models are having some

trouble when dealing with more information. This can be due to the depth of the models or simply

because there is too much input in order to derive the right structure from it. This is confirmed by

experiment 5 where we used 5 timestamps as input sequences and predict t+10, t+20, t+50, and t+100.

These results appear to be more stable because of the limited number of input coordinates. Therefore,

a limitation of this study is the depth of the models for some experiments. By making deeper models,

the GRU and LSTM might be better able to deal with more input sequences and increase predictive

performance. Additionally, in experiment 5 we assessed how far the models can predict into the future.

However, this experiment did not model trajectories based on predictions only (i.e., make predictions

Page 23: Predicting player movements in soccer using Deep Learning

and use these predictions as input to model the next movement). In this way, one could model an

entire trajectory instead of just predicting one timestep ahead. Since this experiment is performed in

many multi-object tracking studies we conducted this experiment as well in experiment 6. Here, both

the LSTM and the GRU perform approximately equally well on a trajectory prediction of 40 timestamps.

However, the error metrics increase significantly. This is because the models use predictions as input

for making new predictions. If these predictions used as input are false, then the model will

automatically return false predictions again. This is a limitation of this experiment. Another important

limitation of this study is that it is hard to compare the results to full multi-object tracking studies based

on visual data. This is because this study intents to predict coordinates, which are numerical data.

These coordinates are 2 digits behind the comma (e.g., 28, 47) which makes it harder to predict right

or wrong, whereas many multi-object tracking studies based on visual data have bounding boxes as

ground-truth and consequently, can verify whether a prediction is right or wrong. So this study

assesses how far a prediction is off the ground-truth coordinates, whereas other studies that built a

complete multi-object tracking algorithm based on visual data assess if a prediction is right or wrong.

Therefore, it is hard to concluded whether our models are better than these other studies. However,

this study intended to pave the way for multi-object tracking in soccer using deep learning so it should

be seen as a first step in this area. Therefore, using these models on a dataset that also contains visuals

would be an interesting field for future work. Overall, when comparing the performance of the LSTM

to the GRU specifically, we can see that the LSTM usually performs a little better than the GRU, so it

would be advised to start with further investigating the performance of the LSTM in such a setting.

9. Conclusions From the results of the experiments conducted within this study, we can draw multiple

conclusions. First, we can conclude that pure coordinates measured in meters are not suitable for

making future predictions using deep learning models such as LSTM and GRU. By creating a proper

scaling for the data (i.e., in this study this is done by calculating the absolute differences between

coordinates at different measurements) the data becomes more appropriate for prediction. By doing

this, one can also overcome the burden of too frequent measurements. As stated earlier, we saw that

the differences between coordinates at subsequent timestamps are extremely small. This is because

the measurement of the coordinates occurs too often. By deleting every nth measurement, this

problem can be overcome as well. However, the cost of this would be a significant decrease in the

available data needed for proper training of the models. Therefore, we can conclude that adjusting the

x, y coordinates to absolute differences between two coordinates at different times is the most

suitable solution to overcome this problem.

Page 24: Predicting player movements in soccer using Deep Learning

Second, we can conclude that both the LSTM and the GRU perform well on the task of predicting

movement and future positions, but the LSTM performs slightly better resulting in lower error metrics

on almost all experimental results. However, from experiment 4 we can conclude that the LSTM and

GRU with 4 hidden layers are not deep enough when dealing with longer input sequences (i.e., 50 or

100). While the results still outperform the benchmark, we can see that the predictive performance

decreases as compared to the shorter input sequences (i.e., 5 or 20). Therefore, deeper models may

enhance the predictive performance of both the LSTM and the GRU when dealing with longer input

sequences. Additionally, this study assessed predictive performance by predicting not only 1 sequence,

but entire sequences by using predictions as input to the model. In that way, a better comparison

between an entire ground-truth trajectory and a predicted trajectory could be assessed. Also on this

experiment, the models seem to perform rather well. However, it must be noted that the error metrics

increase significantly as compared to other experiments.

When comparing both models on the experiments conducted within this study, we can conclude that

the LSTM usually performs slightly better than the GRU. However, the differences between LSTM and

GRU are rather small and, therefore, they could both be suitable when predicting movement and

player positions in a soccer setting. However, it is hard to assess whether our models perform better

to complete multi-object tracking algorithms since these studies usually have bounding boxes as

ground-truth data. Therefore, they sometimes tend to assess the performance with MOTA, multi-

object tracking accuracy – which means that they examine whether positions or movements are

predicted right or wrong. In this study we only predict coordinates (numerical data). This, of course, is

harder to predict right or wrong since our coordinates are measured in 2 digits behind the comma

(e.g., 27,48). Therefore, a right prediction is extremely harder to make. In the next section, we will

give some recommendations for future work which can help to further pave the way for deep learning

models to predict movement and positions in a soccer setting.

10. Future work This study attempted to predict movement and future positions of soccer players using deep learning

and thereby paving the way for multi-object tracking in soccer using deep learning. Since this can be

seen as a starting point, some suggestions for future work that would further contribute to multi-object

tracking in soccer can be made. First, in addition to the experiments performed in this study, we

suggest to assess the predictive performance of both the GRU and the LSTM when they have more

hidden layers. Especially when dealing with longer input sequences we saw that the input information

is too large to properly inherent structure and be just as stable compared to the use of shorter input

sequences. Second, this study used input sequences to predict only one x, y coordinate in the future

of all eleven players. However, it would be interesting to model entire trajectories by using predicted

Page 25: Predicting player movements in soccer using Deep Learning

coordinates as input for predicting the next coordinates. In this way, one would be able to predict

entire trajectories and compare them to the ground-truth trajectory. This would give us a deeper

understanding of the predictive performance over a longer period of time instead of just one time

period. With the experiments conducted within this study, we established a basis by finding that the

LSTM and GRU are capable of predicting further into the future (experiment 5). This is further

developed by modeling trajectories in experiment 6. However, here trajectories of only 40 timestamps

are predicted. It would be interesting to model longer trajectories in future work to see if the models

remain stable over time. This is especially true, since we concluded that on longer sequences our

models are not deep enough. Third, for this study we used a cleaned data set where all the x, y

coordinates of the soccer players are available. However, it would be interesting to see how the model

deals with players that move out of the soccer pitch (e.g., due to a temporary injury treatment) or with

substitutes. Objects disappearing or re-appearing in the space under study is one of the challenges

that is accompanied in multi-object tracking and tackling this challenge would even further contribute

to the development of deep learning models for multi-object tracking in soccer. Fourth, this study is

performed using only ZXY sensor data (i.e., player coordinates that indicate their position in meters in

the soccer pitch). However, the vast majority of the multi-object tracking algorithms in the literature

that use deep learning techniques make use of video or image data as input. Here, the athletes,

pedestrians, and so on are usually identified using pre-trained networks like VGG-16 or ResNet-50 in

all images. Then the coordinates of the identified objects are used as input for the LSTM or GRU models

with bounding-box coordinates or x, y coordinates used as ground-truth. Of course, this study served

as a starting point for multi-object tracking in a real-life soccer setting. The next step, however, could

be to use these models on a video or image data set in order to further develop the use of deep learning

for multi-object tracking in a soccer setting.

11. References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social LSTM: Human

trajectory prediction in crowded spaces. Proceedings of the IEEE conference on computer vision

and pattern recorgnition, 961-971.

Barber, S., & Carré, M. (2010). The effect of surface geometry on soccer ball trajectories. Sports

Engineering 13(1), 47-55.

Ben Shitrit, H., Berclaz, J., Fleuret, F., & Fua, P. (2011). Tracking Multiple People Under Global

Appearance Constraints. 2011 International Conference on Computer Vision, 137-144.

Page 26: Predicting player movements in soccer using Deep Learning

Camomilla, V., Bergamini, V., Fantozzi, S., & Vannozzi, G. (2018). Trends supporting the in-field use of

wearable inertial sensors for sport performance evaluation: A systematic review. Sensors

18(3), 873.

Cho, K., Merriënboer, B. v., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014).

Learning phrase representations using RNN encoder-decoder for statistical machine

translation. arXiv preprint arXiv:1406.1078.

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural

networks on sequence modeling. arXiv preprint arXiv:1412.3555.

Cust, E., Sweeting, A., Ball, K., & Robertson, S. (2019). Machine and deep learning for sport-specific

movement recognition: a systematic review of model development and performance. Journal

of sports sciences 37(5), 568-600.

Felsen, P., Agrawal, P., & Malik, J. (2017). What will happen next? Forecasting player moves in sports

videos. Proceedings of the IEEE International Conference on Computer Vision , 3342-3351.

Gade, R., & Moeslund, T. B. (2018). Constrained multi-target tracking for team sports activities. IPSJ

Transactions on Computer Vision and Applications, 2-11.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation 9(8), 1735-

1780.

Kim, B., Kang, C., Kim, J., Lee, S., Chung, C., & Choi, J. (2017). Probabilistic vehicle trajectory prediction

over occupancy grid map via recurrent neural network. IEEE 20th International Conference on

Intelligent Transportation Systems (ITSC), 399-404.

Kim, W., Moon, S.-W., Lee, J., Nam, D.-W., & Jung, C. (2018). Multiple player tracking in soccer videos:

an adaptive multiscale sampling approach. Multimedia Systems, 611-623.

Lara, J., Vieira, C., Misuta, M., Moura, F., & Barros, R. (2018). Validation of a video-based system for

automatic tracking of tennis players. International Journal of Performance Analysis in Sport,

137-150.

Lee, N., & Kitani, K. (2016). Predicting wide receiver trajectories in american football. IEEE Winter

Conference on Applications of Computer Vision (WACV), 1-9.

Ma, C., Yang, C., Yang, F., Zhuang, Y., Zhang, Z., Jia, H., & Xie, X. (2018). Trajectory factory: Tracklet

cleaving and re-connection by deep siamese bi-gru for multiple object tracking. IEEE

International Conference on Multimedia and Expo (ICME), 1-6.

Nikhil, N., & Tran Morris, B. (2018). Convolutional Neural Network for Trajectory Prediction.

Proceedings of the European Conference on Computer Vision (EECV).

Pettersen, S. A., Johansen, D., Johansen, H., Berg-Johansen, V., Gaddam, V. R., Mortensen, A., . . .

Halvorsen, P. (2014). Soccer video and player position dataset. Proceedings of the International

Conference on Multimedia Systems (MMSys), 18-23.

Page 27: Predicting player movements in soccer using Deep Learning

Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues

with long-term dependencies. Proceedings of the IEEE International Conference on Computer

Vision, 300-311.

Shah, R., & Romijnders, R. (2016). Applying deep learning to basketball trajectories. arXiv preprint

arXiv:1608.03793.

Suda, S., Makino, Y., & Shinoda, H. (2019). Prediction of Volleyball Trajectory Using Skeletal Motions

of Setter Player. Proceedings of the 10th Augmented Human International Conference, 16.

Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to Track: Online Multi-Object Tracking by Decision

Making. Proceedings of the IEEE international conference on computer vision, 4705-4713.

Xue, H., Huynh, D., & Reynolds, M. (2018). SS-LSTM: a hierarchical LSTM model for pedestrian

trajectory prediction. IEEE Winter Conference on Applications of Computer Vision (WACV),

1186-1194.

Zhang, Z., Xu, D., & Tan, M. (2010). Visual measurement and prediction of ball trajectory for table

tennis robot. IEEE Transactions on Instrumentation and Measurement 59(2), 3195-3205.

Zhao, Y., Yang, R., Chevalier, G., Shah, R., & Romeijnders, R. (2018). Applying deep bidrectional LSTM

and mixture density network for basketball trajectory prediction. Optik 158, 266-272.