Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...
-
Upload
mlconf -
Category
Technology
-
view
73 -
download
0
Transcript of Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...
Learning to Play SportsML in sports analytics
Dr. Tim ChartierTresata
Davidson [email protected]
Dr. Amy LangvilleCollege of Charleston
Dept. of [email protected]
@timchartier
Outline of talk
Play from benchdata availability
general interest(a.k.a. cool factor)
domain knowledge
Application 1: Ranking
Apply here
Picture credit: http://orlandonest.files.wordpress.com/2011/03/2011-march-madness-bracket.gif
How do we do?• ESPN Tournament Challenge: > 4 million
brackets! • 1st round correct choice = 10 points• nth round correct choice = 2*(previous round)
finding ideal weight4 prediction
Method 1: crowd source• 2009 – best bracket – 97%• 2010 – best bracket – 99%• 2014 – national media led to thousands of
brackets on: marchmathness.davidson.edu
Method 2: learn sports
• vary parameter weights to optimize ESPN score or prediction rate
• subtlety: not all seasons are equally predictive
Method 3: mad web
10 years, 50,000 games
Application 2: Cats StatsAnalytics for college teams to support coaching.
sports analytics keys• coachable• consumable • understandable (informed opinion)
impact: coaching
“It kind of blew us away…it really opened our eyes...” – Matt McKillop, NYT
impact: off-seasonPlayer Poss. TO% OR% EFG% 2P% 3P%
Brian Sullivan
77 14.3% 20.0% 65.6% 67.4% 40.0%
without 56 23.2% 20.8% 55.3% 42.9% 47.1%
Application 3: Lotsa data
missile tech
25 frames/sec
Filtered for Warriors regular season
data we have
SportVU-like data
MasseyRatings.comcolumn 1 = date of game as measured as days since 1/1/0000column 2 = date in YYYYMMDD formatcolumn 3 = team 1 indexcolumn 4 = team 1 home field (1 = home, -1 = away, 0 = neutral)column 5 = team 1 scorecolumn 6 = team 2 indexcolumn 7 = team 2 home field (1 = home, -1 = away, 0 = neutral)column 8 = team 2 score
Tresata DataFor network analysis, Tresata added: • seed• coach’s Madness history• kenpom.com statistics• every season game (and added game stats)What can we learn from about 50,000 games?
Data needed• ESPN bracket challenge scores for past years• injuries for every game• score with 2 min or 4 minutes left• learn from Vegas odds• biometric data
• If we remove a team and it highly affects reranking, what can we learn about such a team for March Madness?
• How can Buddy Hield light up March Madness?
• Compare Jack Gibbs to Stephen Curry in college play.
media ?’s
New WorkHow rankable is this dataset?
RankabilityData
Apps Amazon productsNetflix moviesFinancial networksTeams
Intuitive Ideasone extreme
Dominance graph(very rankable)
Random graph(less rankable)
other extreme
InconsistencyUparcs in a rank-ordered graph
5 uparcs
InconsistencyUparcs in a rank-ordered graphMinimum Violations Ranking
3 uparcs
Inconsistency
BUT this measure of rankability is tied to the ranking.
March Madness 2008 sorted by Massey ratinguparcs = 27.2%
March Madness 2014 sorted by Massey ratinguparcs = 26.9%
Goalk-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1 2-3-4-5-
1-2 3-4-5-1-
2-3 4-5-1-2-
3-4 5-1-2-3-
4-5
Goalk-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1 2-3-4-5-
1-2 3-4-5-1-
2-3 4-5-1-2-
3-4 5-1-2-3-
4-5
4-paths: 1-2-1-2-1 2-1-2-
1-2
Goalk-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1 2-3-4-5-
1-2 3-4-5-1-
2-3 4-5-1-2-
3-4 5-1-2-3-
4-5
4-paths: 1-2-1-2-1 2-1-2-
1-2
Future WorkIf a dataset is not very rankable, which edges should we add to the graph to improve its rankability?
earn to play sports
data questions applications
questions?
Picture credit: http://www.trendir.com/ultra-modern/