(Much) better than lotto tickets: Analytics and NCAA ... · NCAA Introduction Outline Introduction...

18
NCAA (Much) better than lotto tickets: Analytics and NCAA tournament win probabilities Michael J. Lopez Assistant Statistics Professor Skidmore College @StatsbyLopez March 2, 2016

Transcript of (Much) better than lotto tickets: Analytics and NCAA ... · NCAA Introduction Outline Introduction...

NCAA

(Much) better than lotto tickets:Analytics and NCAA tournament win probabilities

Michael J. Lopez

Assistant Statistics ProfessorSkidmore College@StatsbyLopez

March 2, 2016

NCAA

Introduction

Outline

IntroductionKaggle contestWhat we didLucky or good?

Discussion and AdviceKaggleTraditional poolsSports analytics

NCAA

Introduction

Background

I Single elimination tournament with 64-68 teams.

I $2.5 billion wagered on the tournament in 2012 (Boudway2014; Tsu 2014)

I Reminder: gambling is still illegal in most places

NCAA

Introduction

Scoring formats:

1. Traditional 1:2:4:8:16:32 (Yahoo, ESPN, etc)I Points for picking winners only, done before tournament

2. Kaggle “March Machine Learning Mania”I All games judged on predictive probabilities

NCAA

Introduction

Kaggle contest

Kaggle description

I ∼ 400 entries from 248 teams in 2014 (500 teams in 2015)

I Predict win probability for every possible game (2278contests)

I Only 63 games actually played used in scoring

I Scoring function:

LogLoss = −y × log(y) + (1 − y) × log(1 − y)

where y is the predicted probability of a win and y is theactual outcome (0 or 1).

NCAA

Introduction

Kaggle contest

I We* won this contest in 2014.I I’ll address a few main questions:

I What did we do?I How lucky did we get?I Traditional NCAA poolsI Lessons transferable to sports analytics

*Jointly with Gregory Matthews (Loyola-Chicago)

NCAA

Introduction

What we did

I Two sources of data:I Las Vegas point spread data (Model M1)I Ken Pomeroy efficiency (Pomeroy 2012) ratings (Model M2)

I Why efficiency?

I Logistic regression: outcome variable of 1 for a win and 0 fora loss.

I Model M1: 1 predictorI Spread

I Model M2: 5 predictorsI Offensive efficiency (home, away)I Defensive efficiency (home, away)I Neutral indicator

NCAA

Introduction

What we did

I Find w to minimize LogLoss of w × yM1 + (1−w)× (1− yM2)

I In-sample versus out-of-sample testingI Our submissions:

I S1 = 0.75ym1 + 0.25ym2

I S2 = 0.25ym1 + 0.75ym2 (Winning entry)

NCAA

Introduction

Lucky or good?

How lucky were we?

I We simulated a the tournament 10,000 times with differing“true” underlying win probabilities:

I S1, S2I Mean of top 10 entries.I Mean of all entires.I All games 0.5

I In each simulated tournament we scored all entries andcounted how often we won.

NCAA

Introduction

Lucky or good?

Results, Kaggle tournament

I Given our probabilities as true probabilities:I Each entry ∼ 15% of winningI Each entry ∼ 50% of top-10 finish

I We finished 4th in 2015

NCAA

Discussion and Advice

Outline

IntroductionKaggle contestWhat we didLucky or good?

Discussion and AdviceKaggleTraditional poolsSports analytics

NCAA

Discussion and Advice

Lessons, Kaggle tournament

1. You always need luck

2. 2 prediction models combined together outperform eitheralone

3. Better data ≥ complex models

NCAA

Discussion and Advice

Traditional pools

Lessons, traditional pools

1. Find upset pools - people are idiots pick too passively

2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n

3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘

4. P(Loss) > P(Win)

NCAA

Discussion and Advice

Traditional pools

Lessons, traditional pools

1. Find upset pools - people are idiots pick too passively

2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n

3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘

4. P(Loss) > P(Win)

NCAA

Discussion and Advice

Traditional pools

Lessons, traditional pools

1. Find upset pools - people are idiots pick too passively

2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n

3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘

4. P(Loss) > P(Win)

NCAA

Discussion and Advice

Traditional pools

Lessons, traditional pools

1. Find upset pools - people are idiots pick too passively

2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n

3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘

4. P(Loss) > P(Win)

NCAA

Discussion and Advice

Sports analytics

Lessons, sports analytics

1. Research

2. Visualize

3. Practice

4. Share

5. Improve

6. Generalize

Sidebar: R-statistical software

NCAA

Discussion and Advice

Sports analytics

Citations

I Lopez, M and Matthews, G.J. “Building an NCAA men’sbasketball predictive model and quantifying its success.”Journal of Quantitative Analysis in Sports, 11:1 (2015): 5-12.

I Carlin, B. P. 1996. ‘Improved NCAA Basketball TournamentModeling Via Point Spread and Team Strength Information.’The American Statistician 50:3943.

I Pomeroy, K. 2012. Ratings Glossary. URLhttp://bit.ly/1LGb79q (accessed June 1, 2014).