Developing a Data-Driven Player Ranking in Soccer Using...

45
Developing a Data-Driven Player Ranking in Soccer Using Predic8ve Model Weights Joel Brooks Ma>hew Kerr John Gu>ag Massachuse>s Ins8tute of Technology

Transcript of Developing a Data-Driven Player Ranking in Soccer Using...

DevelopingaData-DrivenPlayerRankinginSoccerUsingPredic8veModelWeights

JoelBrooks Ma>hewKerr JohnGu>agMassachuse>sIns8tuteofTechnology

Mo8va8on•  TeamsthisyearinLaLigaarespendingover$3billionontheirplayers

•  Howdoteamsevaluateaplayer’sworth?

•  Quan8ta8vemetricscanhelpjus8fysubjec8veevalua8ons,andprovidenewinsightintonon-obviousplayercontribu8ons

Imagesource:h>p://espn.go.com/sports/soccer/news/_/id/5580467/european-football-ea8ng-itself2

PassingContribu8onsonOffense

•  Passingstrategyisakeycomponentofoveralloffensivesuccess

•  Wefocusedspecificallyonthecontribu8onofspecificpassesonoffense

•  Passestellyoualotabouthowmuchanindividualplayercontributesonoffenseoutsideofgoalsandassists

3

4

5

Howtorankplayersfrompasses?

6

Loca8onsofPassesWithinaPossession

Howtorankplayersfrompasses?

7

Loca8onsofPassesWithinaPossession

DistributedLoca8onFeatureRepresenta8on

Howtorankplayersfrompasses?

8

SupervisedLinearModelforPredic8ngShots

Loca8onsofPassesWithinaPossession

DistributedLoca8onFeatureRepresenta8on

Howtorankplayersfrompasses?

9

SupervisedLinearModelforPredic8ngShots

PassValueMeasurementBasedonModelWeights

Loca8onsofPassesWithinaPossession

DistributedLoca8onFeatureRepresenta8on

Howtorankplayersfrompasses?

10

SupervisedLinearModelforPredic8ngShots

PassValueMeasurementBasedonModelWeights

Loca8onsofPassesWithinaPossession

DistributedLoca8onFeatureRepresenta8on

Thedata

•  (x,y)coordinatesofallpassoriginsanddes8na8onsfromthe2012-2013LaLigaseason

•  >300,000passes•  380games•  >500players

11

Imagesource:h>ps://flic.kr/p/nvVaHM

PassLoca8onRepresenta8on

DEFENSE OFFENSE

12

PassLoca8onRepresenta8on

DEFENSE OFFENSE

13

SparsePassLoca8onRepresenta8on

DEFENSE OFFENSE

Origin(zone10) =[0,0,0,0,0,0,0,0,0, ,0,0,0,0,0,0,0,0]Des8na8on(zone14) =[0,0,0,0,0,0,0,0,0,0,0,0,0, ,0,0,0,0]

14

DensePassLoca8onRepresenta8on

DEFENSE OFFENSE

15

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]Des8na8on(zone14)=[0,0,0,0,0,0,0,0,0,0, ,0,0, ,0,0,0,0]

PassLoca8onRepresenta8onFormula•  Representapassloca8onlas:

•  Eachelementofris:

•  d(l,zi)istheEuclideandistancebetweenlandthecenterofzoneI•  ciisanindicatorvariablethatis1ifiisoneoftheNclosestzones,0

otherwise•  Inprac8ceN=2seemtoleadtothebestresults

16

DensePassLoca8onRepresenta8on

DEFENSE OFFENSE

17

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]Des8na8on(zone14)=[0,0,0,0,0,0,0,0,0,0, ,0,0, ,0,0,0,0]

Howdowerepresenttheloca8onofacollec8onofpasses?

DEFENSE OFFENSE

18

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:

19

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:1.  Computerloandrld,

thevectorrepresenta8onsofthepassoriginanddes8na8on

20

21

DEFENSE OFFENSE

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:1.  Computerloandrld,

thevectorrepresenta8onsofthepassoriginanddes8na8on

22

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]

Des8na8on(zone14) =[0,0,0,0,0,0,0,

0,0,0, ,0,0, ,0,0,0,0]

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:1.  Computerloandrld,the

vectorrepresenta8onsofthepassoriginanddes8na8on

2.  ComputethematrixRlod=rloxrld,theouterproductoftheoriginanddes8na8onrepresenta8ons

23

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]

Des8na8on(zone14) =[0,0,0,0,0,0,0,

0,0,0, ,0,0, ,0,0,0,0]

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:1.  Computerloandrld,the

vectorrepresenta8onsofthepassoriginanddes8na8on

2.  ComputethematrixRlod=rloxrld,theouterproductoftheoriginanddes8na8onrepresenta8ons

24

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 .32 0 0

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]

Des8na8on(zone14) =[0,0,0,0,0,0,0,

0,0,0, ,0,0, ,0,0,0,0]

Origin-Des8na8onOuterProduct(10,14)=

......

FeatureVectorforaPossession

•  Foreachpassinthepossessionwithanoriginloanddes8na8onld:1.  Computerloandrld,the

vectorrepresenta8onsofthepassoriginanddes8na8on

2.  ComputethematrixRlod=rloxrld,theouterproductoftheoriginanddes8na8onrepresenta8ons

25

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]

Des8na8on(zone14) =[0,0,0,0,0,0,0,

0,0,0, ,0,0, ,0,0,0,0]

R(7,11)= * =R(7,11)= * =R(7,14)= * =R(10,11)= * =R(10,14)= * =

Origin-Des8na8onOuterProduct(10,14)=

FeatureVectorforaPossession•  Foreachpassinthe

possessionwithanoriginloanddes8na8onld:1.  Computerloandrld,the

vectorrepresenta8onsofthepassoriginanddes8na8on

2.  ComputethematrixRlod=rloxrld,theouterproductoftheoriginanddes8na8onrepresenta8ons

3.  Constructthefeaturevectoras: [rlo,rld,fla>en(Rlod)]

26

Origin(zone10) =[0,0,0,0,0,0, ,0,0, ,0,0,0,0,0,0,0,0]

Des8na8on(zone14) =[0,0,0,0,0,0,0,

0,0,0, ,0,0, ,0,0,0,0]

R(7,11)= * =R(7,11)= * =R(7,14)= * =R(10,11)= * =R(10,14)= * =

Origin-Des8na8onOuterProduct(10,14)=

FeatureVectorforaPossession

•  18origin+18des8na8on+324origin-des8na8onpairfeatures=360features

•  Thefeaturevectorforapossessionistheaverageofthefeaturevectorsforeachindividualpass

•  Eachfeaturevectorisassignedalabel:–  +1ifthepossessionendedinashot–  -1otherwise

27

ExperimentalOverview•  Usedthefirst80%ofgames

inthe2012-2013seasonasatrainingset

•  Evaluatedthemodelonthefinal20%

•  TrainedaL2-regularizedSVMmodelfindingthewthatminimizes:

•  ClassspecificcostparametersCkchosenwith5-foldcrossvalida8on

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive RateTr

ue P

ositi

ve R

ate

ROC Curve for Shot Predition Model (AUC = .79)

28

AUC=0.79

TopModelWeights

14 − 1816 − 14

9 − 1415 − 14

18 − 168 − 15

18 − 1414 − 13

14 − DT5 − 14

−0.015

−0.010

−0.005

0

0.005

0.010

0.015

0.020

Feature

Relat

ive F

eatu

re W

eight

DEFENSE OFFENSE

29

14 − 1816 − 14

9 − 1415 − 14

18 − 168 − 15

18 − 1414 − 13

14 − DT5 − 14

−0.015

−0.010

−0.005

0

0.005

0.010

0.015

0.020

Feature

Relat

ive F

eatu

re W

eight

TopModelWeights

DEFENSE OFFENSE

30

31

14 − 1816 − 14

9 − 1415 − 14

18 − 168 − 15

18 − 1414 − 13

14 − DT5 − 14

−0.015

−0.010

−0.005

0

0.005

0.010

0.015

0.020

Feature

Relat

ive F

eatu

re W

eight

TopModelWeights

DEFENSE OFFENSE

32

ModelWeightsàPassValueMetric

•  Weightsprovideaconceptualmaptowhichloca8onsleadtoshots

•  Eachpasshasthreerelevantmodelweights– Origin– Des8na8on– Origin-Des8na8onpair

33

PassShotValue(PSV)

•  PassShotValue(PSV)iscomputedforapasswithorigininzoneiandades8na8oninzonejas:

•  Sumofthemodelweightsforthecorrespondingorigin,des8na8on,andorigin-des8na8onpair,respec8vely

34

PassShotValue(PSV)

•  PassShotValue(PSV)iscomputedforapasswithorigininzoneiandades8na8oninzonejas:

•  Sumofthemodelweightsforthecorrespondingorigin,des8na8on,andorigin-des8na8onpair,respec8vely

•  e.g.:

35

PSVasaPlayerMetric

•  Foreveryplayer,computethePSVforeverycompletedpassinwhichtheywerethedistributor

•  Averagethesevaluesovertheen8recourseoftheseason

•  Limitedanalysistoplayerswith>200completedpasses– ~350players

36

AveragePSVforLaLiga2012-2013

37

TopPlayersbyAveragePSV

Offense

38

TopPlayersbyAveragePSV

Offense

39

WinnerBallonD’or2013

Runner-upBallonD’or2013

TopPlayersbyAveragePSV

Offense Midfield

40

TopPlayersbyAveragePSV

Offense Midfield Defense

41

TopPlayersbyAveragePSV

Offense Midfield Defense

TopGoalScorers TopPlayersbyAssists 42

TopPlayersbyAveragePSV

Offense Midfield Defense

TopGoalScorers TopPlayersbyAssists

Correla8on:ρ=0.27,p<0.05

43

Conclusion•  Theloca8onsofpassescanpredictwhetherapossessionendsinashot

•  Therela8onshipbetweenpassloca8onandshotscanbeusedtounderstandtheoffensivevalueofindividualpasses

•  AveragePSVseparatesplayersbyposi8on,andseemstocorrelatewellwithoffensiveabilitywithineachposi8on

•  Almosteveryotherpopularsportiscollec8ngloca8onsofevents,soasimilarmethodologycouldbeapplied

44

Conclusion•  Theloca8onsofpassescanpredictwhetherapossessionendsinashot

•  Therela8onshipbetweenpassloca8onandshotscanbeusedtounderstandtheoffensivevalueofindividualpasses

•  AveragePSVseparatesplayersbyposi8on,andseemstocorrelatewellwithoffensiveabilitywithineachposi8on

•  Almosteveryotherpopularsportiscollec8ngloca8onsofevents,soasimilarmethodologycouldbeapplied

45Ques8ons?