Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...

Post on 17-Jan-2017

73 views 0 download

Transcript of Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...

Learning to Play SportsML in sports analytics

Dr. Tim ChartierTresata

Davidson Collegetichartier@davidson.edu

Dr. Amy LangvilleCollege of Charleston

Dept. of MathLangvilleA@cofc.edu

@timchartier

Outline of talk

Play from benchdata availability

general interest(a.k.a. cool factor)

domain knowledge

Application 1: Ranking

Apply here

Picture credit: http://orlandonest.files.wordpress.com/2011/03/2011-march-madness-bracket.gif

How do we do?• ESPN Tournament Challenge: > 4 million

brackets! • 1st round correct choice = 10 points• nth round correct choice = 2*(previous round)

finding ideal weight4 prediction

Method 1: crowd source• 2009 – best bracket – 97%• 2010 – best bracket – 99%• 2014 – national media led to thousands of

brackets on: marchmathness.davidson.edu

Method 2: learn sports

• vary parameter weights to optimize ESPN score or prediction rate

• subtlety: not all seasons are equally predictive

Method 3: mad web

10 years, 50,000 games

Application 2: Cats StatsAnalytics for college teams to support coaching.

sports analytics keys• coachable• consumable • understandable (informed opinion)

impact: coaching

“It kind of blew us away…it really opened our eyes...” – Matt McKillop, NYT

impact: off-seasonPlayer Poss. TO% OR% EFG% 2P% 3P%

Brian Sullivan

77 14.3% 20.0% 65.6% 67.4% 40.0%

without 56 23.2% 20.8% 55.3% 42.9% 47.1%

Application 3: Lotsa data

missile tech

25 frames/sec

Filtered for Warriors regular season

data we have

SportVU-like data

MasseyRatings.comcolumn 1 = date of game as measured as days since 1/1/0000column 2 = date in YYYYMMDD formatcolumn 3 = team 1 indexcolumn 4 = team 1 home field (1 = home, -1 = away, 0 = neutral)column 5 = team 1 scorecolumn 6 = team 2 indexcolumn 7 = team 2 home field (1 = home, -1 = away, 0 = neutral)column 8 = team 2 score

Tresata DataFor network analysis, Tresata added: • seed• coach’s Madness history• kenpom.com statistics• every season game (and added game stats)What can we learn from about 50,000 games?

Data needed• ESPN bracket challenge scores for past years• injuries for every game• score with 2 min or 4 minutes left• learn from Vegas odds• biometric data

• If we remove a team and it highly affects reranking, what can we learn about such a team for March Madness?

• How can Buddy Hield light up March Madness?

• Compare Jack Gibbs to Stephen Curry in college play.

media ?’s

New WorkHow rankable is this dataset?

RankabilityData

Apps Amazon productsNetflix moviesFinancial networksTeams

Intuitive Ideasone extreme

Dominance graph(very rankable)

Random graph(less rankable)

other extreme

InconsistencyUparcs in a rank-ordered graph

5 uparcs

InconsistencyUparcs in a rank-ordered graphMinimum Violations Ranking

3 uparcs

Inconsistency

BUT this measure of rankability is tied to the ranking.

March Madness 2008 sorted by Massey ratinguparcs = 27.2%

March Madness 2014 sorted by Massey ratinguparcs = 26.9%

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

4-paths: 1-2-1-2-1 2-1-2-

1-2

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

4-paths: 1-2-1-2-1 2-1-2-

1-2

Future WorkIf a dataset is not very rankable, which edges should we add to the graph to improve its rankability?

earn to play sports

data questions applications

questions?

Picture credit: http://www.trendir.com/ultra-modern/