An introduction to football modelling at Smartodds Oxford ...€¦ · to football modelling at...

32
An introduction to football modelling at Smartodds Robert Johnson An introduction to football modelling at Smartodds Oxford SIAM Conference 2011 Robert Johnson Smartodds Ltd February 9, 2011

Transcript of An introduction to football modelling at Smartodds Oxford ...€¦ · to football modelling at...

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    An introduction to football modelling atSmartodds

    Oxford SIAM Conference 2011

    Robert Johnson

    Smartodds Ltd

    February 9, 2011

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Introduction

    Introduction to Smartodds

    Practical example: building a football model

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    What is Smartodds about?

    Smartodds provides statistical research andsports modelling in the betting sector

    Quant team research and implement the sportsmodels

    Primary focus is on Football, however we alsomodel Basketball, Baseball, American Football,Ice Hockey and Tennis

    Wide range of interesting problems to work on

    Actively recruiting!

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Building a Football model

    Suppose we decide to build a football model forthe English football leagues

    Here we model the divisions Premier League,Championship, League 1 and League 2

    There are 92 teams in total to model

    We want to predict the probability of team Awinning against team B where team A and teamB could be from any of the 4 leagues

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Literature review

    Maher (1982) assumed independent Poissondistributions for home and away goals

    Means based on each teams’ past performance

    Dixon and Coles (1997) took this idea further byaccounting for fluctuations in performance ofindividual teams and estimation between leagues

    Dixon and Robinson (1998) modelled the scoresduring a game as a two-dimensional birth process

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model formulation

    Assume that home and away goals follow aPoisson distribution

    Pr(x goals) =λxe−λ

    x!

    Pr(y goals) =µye−µ

    y !

    To estimate the probabilities of x and y goals weneed λ and µ

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 1: Mean goals

    Assume that home and away teams are expectedto score the same number of goals

    Take average goals scored in a game in Englandas 2.56 and divide by two

    λ = 1.28

    µ = 1.28

    However we may believe that there is someadvantage associated with playing at home

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 2: Home Advantage

    Include a term to take account of homeadvantage

    λ = γ × τ

    µ = γ

    γ is the common mean and τ represents thehome advantage

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 2: Home Advantage (Cont)

    Mean goals scored by the away team in the fourleagues we model English Leagues is 1.10 giving

    γ = 1.10

    This implies mean goals scored by the hometeam are 2.56− 1.10 = 1.46Using the above we can estimate τ as

    τ = 1.46/1.10 = 1.33

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 3: Team Strengths

    Previous attempts assumed all teams of equalstrength

    Can add team strength parameters for each team

    Better teams score more goals. Give each teaman attack parameter denoted α

    Better teams concede fewer goals. Give eachteam a defence parameter denoted β

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 3: Team Strengths (Cont)

    Write λ and µ in terms of the attack anddefence parameters of the home and away teams,which we denote by i and j , giving

    λ = γ × τ × αi × βj

    µ = γ × αj × βi

    The model is overparameterised, so we apply theconstraints

    1

    n

    n∑i=1

    αi = 1,1

    n

    n∑i=1

    βi = 1.

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model 3: Pseudolikelihood

    The pseudolikelihood for this model is:

    L(γ, τ, αi , βi ; i = 1, . . . , n) =∏k

    {exp(−λk)λxkk exp(−µk)µykk }

    φ(t−tk )

    φ(·) is an exponential downweighting function,which allows us to place less weight on oldergames

    Other downweighting functions could be used

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Estimation techniques

    Obtaining the parameter estimates is notstraightforward

    In this example we have 186 parameters toestimate

    Various optimisation techniques could be used toobtain parameter estimates (numericalmaximisation of the likelihood function, MCMC)

    High dimensional problems may also requiremore sophisticated computing solutions (MPI)

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Parameter estimates

    These are Smartodds’ current estimates of theattack and defence parameters of the top 6teams in the Premier League

    Team Attack Parameter Defence Parameter

    Chelsea 3.15 0.34Man Utd 3.08 0.35Arsenal 2.84 0.37

    Man City 2.44 0.42Tottenham 2.22 0.44Liverpool 2.12 0.39

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Predicting outcomes

    Suppose Man Utd are playing at home to ManCity

    Using the parameter estimates we get

    λ = 1.10× 1.33× 3.08× 0.42 = 1.89

    µ = 1.10× 2.44× 0.35 = 0.94

    We can use λ and µ to obtain the probability ofMan Utd winning the match

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Predicting outcomes (Cont)

    The probability of a specific score is given asfollows

    Pr(x , y) =λxe−λ

    x!

    µye−µ

    y !

    So the probability of the score, Man Utd 2 ManCity 1, is

    Pr(2, 1) =1.892e−1.89

    2!

    0.941e−0.94

    1!= 0.099

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Predicting outcomes (Cont)

    Obtain the probability matrix of all possiblescores

    0 1 2 3 4 . . .

    0 0.059 0.112 0.105 0.066 0.031 . . .1 0.055 0.105 0.099 0.062 0.029 . . .2 0.026 0.049 0.047 0.029 0.014 . . .3 0.008 0.015 0.015 0.009 0.004 . . .4 0.002 0.004 0.003 0.002 0.001 . . ....

    ......

    ......

    .... . .

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Predicting outcomes (Cont)

    Sum over all events where home goals aregreater than away goals

    0 1 2 3 4 . . .

    0 0.059 0.112 0.105 0.066 0.031 . . .1 0.055 0.105 0.099 0.062 0.029 . . .2 0.026 0.049 0.047 0.029 0.014 . . .3 0.008 0.015 0.015 0.009 0.004 . . .4 0.002 0.004 0.003 0.002 0.001 . . ....

    ......

    ......

    .... . .

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Predicting outcomes (Cont)

    Giving the probability that Man Utd win at hometo Man City as 59.6%

    0 1 2 3 4 . . .

    0 0.059 0.112 0.105 0.066 0.031 . . .1 0.055 0.105 0.099 0.062 0.029 . . .2 0.026 0.049 0.047 0.029 0.014 . . .3 0.008 0.015 0.015 0.009 0.004 . . .4 0.002 0.004 0.003 0.002 0.001 . . ....

    ......

    ......

    .... . .

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Practical issues

    Betfair’s odds imply Man Utd has a 63% chanceof winning the game, potentially leaving value fora bet on Man City. However, should we bet?

    These models take into account no externalinformation about match circumstances

    InjuriesMotivationFatigueNewly signed players

    So betting off a mathematical model would bedangerous!

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Shortcomings of the model

    If we compare the expected full-time scoresunder the model with the observed scores, wefind our modelling assumptions don’t hold

    Goals don’t have a Poisson distributionGoals scored by the home and away teams aren’tindependent

    Dixon and Coles corrected for this by modifyingthe predicted distribution to increase probabilityof draws and 0-1 and 1-0 scores

    However this isn’t entirely satisfactory — wouldbe better to model what is happening directly

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Goal time distribution

    0.00

    0.01

    0.02

    0.03

    0.04

    0.05

    Goal time (mins)

    Den

    sity

    0 10 20 30 40 50 60 70 80 90

    Goals in injury time at the end of each half arerecorded as 45 / 90 min goals

    Goal rate steadily increases over the course ofthe game

    Notice the spikes every 5 minutes in the secondhalf - due to rounding?

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Dixon and Robinsons’ model

    If we assume that the goal scoring processes forthe home and away teams are independenthomogeneous Poisson processes then our modelreduces to the full time model discussedpreviously.

    For match k between teams i and j

    λk(t) = λk = γ × τ × αi × βj

    µk(t) = µk = γ × αj × βi

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Dixon and Robinsons’ model (continued)

    Three changes:

    1 Goal-scoring rate dependent on the current score

    2 Modelling of injury time

    3 Increasing goal-scoring intensity through thegame (due to tiredness of players)

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    (1) Goal-scoring rate dependent on current score

    Assume that home and away scoring processesare independent Poisson processes

    Scoring rates are piecewise constant

    Home and away intensities are constant until agoal is scored and only change at these times

    Denote λxy and µxy as parameters determiningthe scoring rates when the score is (x ,y)

    Scoring rates are now

    λk(t) = λxyλk

    andµk(t) = µxyµk

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Estimates of λ(x , y) and µ(x , y)

    λ̂(0, 0) = 1µ̂(0, 0) = 1

    λ̂(1, 0) = 0.88µ̂(1, 0) = 1.35

    λ̂(0, 1) = 1.10µ̂(0, 1) = 1.07

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    (2) Increase the scoring rate during injury time

    Goals scored during injury time are recorded ashaving occurred at either 45 or 90 minutes.

    Define two new parameters ρ1 and ρ2 to modelinjury time.

    The adjusted scoring rates are

    λk(t) =

    ρ1λxyλk t ∈ (44, 45]mins,ρ2λxyλk t ∈ (89, 90]mins,λxyλk otherwise

    and similarly for µk(t)

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    (3) Increasing goal-scoring intensity

    Allow the scoring intensities to increase over time

    Model scoring rates as time inhomogeneousPoisson processes with a linear rate of increase

    Replace λk(t) and µk(t) with

    λ∗k(t) = λk(t) + ξ1t,

    µ∗k(t) = µk(t) + ξ2t

    ξ1 and ξ2 could be constrained to be positive toensure that the hazard functions above areconstrained to always be positive, but in practicethis is not neccessary

    Scoring rates are estimated to be about 75%higher at the end of the game then at the startof the game.

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Model usage

    This ‘in-running’ model can be useful in its ownright (for deriving in-running prices)

    Also explains the home/away dependencies andnon-Poisson pdfs observed in the data

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Summary

    The Dixon-Coles model is a simple and robustfull-time score model, but not all of itsassumptions are met

    A continuous time model such as theDixon-Robinson model can model dependenciesbetween home and away scoring rates

    Mathematical models cannot model team news(unless this is incorporated into the modelsomehow)

    These models can be extended to other sports bychanging the distributions, eg

    Normal distribution for American FootballNegative binomial for baseball

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    References

    M.J. Maher, 1982, Modelling association footballscores, Statist. Neerland., 36, 109-1188

    M. Dixon and S.G. Coles, 1997. ModellingAssociation Football Scores and Inefficiencies inthe Football Betting Market. Applied Statistics,46(2), 265-280

    M. Dixon and M. Robinson, 1998. A birthprocess model for association football matches.JRSS D, 47(3), 523-538

  • Anintroductionto footballmodelling atSmartodds

    RobertJohnson

    Interested?

    If you are interested in sports modelling andpossess the following skills:

    Post graduate qualification (at least MMath /MSc, PhD. preferred) in mathematics, statisticsor another subject with considerablemathematical contentExperience in developing and implementingmathematical / statistical modelsExperience of computer programming(preferably in C++, C, R or Python)Enthusiasm, self-motivation and the ability towork under pressure to strict deadlines

    Then email us at [email protected]

    For more information see our website:http://www.smartodds.co.uk

    [email protected]://www.smartodds.co.uk

    HomeIntroductionWhat is SmartOdds about?Building a Football modelLiterature review

    Model formulationModel 1: Mean goalsModel 2: Home AdvantageModel 3: Team StrengthsModel 3: PsedolikelihoodEstimation techniquesParameter estimatesPredicting outcomesPractical issuesShortcomings of the modelGoal time distribution

    Dixon and Robinson's model(1) Goal-scoring rate dependent on current scoreEstimates of λ(x,y) and μ(x,y)

    (2) Increase the scoring rate during injury time(3) Increasing goal-scoring intensityModel usage

    SummaryReferencesInterested?