redictive Analytics: Prediction Model By Dr. Ash...

74
Sports Predictive Analytics: NFL Prediction Model By Dr. Ash Pahwa IEEE Computer Society San Diego Chapter January 17, 2017 Copyright 2017 Dr. Ash Pahwa 1

Transcript of redictive Analytics: Prediction Model By Dr. Ash...

Page 1: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports Predictive Analytics:NFL Prediction Model

ByDr. Ash Pahwa

IEEE Computer SocietySan Diego Chapter

January 17, 2017

Copyright 2017 Dr. Ash Pahwa 1

Page 2: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Outline� Case Studies of Sports Analytics� Sports� Sports Analytics� Applications of Sports Analytics� Sports Analytics Literature� Data Sources� Sports Predictive Models� Regression Model� Multi Variable Regression with Lasso� NFL Prediction Model� Prediction for Super Bowl 2016� Prediction for NFL 2017 Playoffs

Copyright 2017 Dr. Ash Pahwa 2

Page 3: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

San Diego L.A. Chargers

Copyright 2017 Dr. Ash Pahwa 3

Page 4: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Case Studies of Sports Analytics?

Copyright 2017 Dr. Ash Pahwa 4

Page 5: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

What is Sports Analytics?Which Pitcher is Better?

Copyright 2017 Dr. Ash Pahwa 5

2007 BaseballJake Peavy John Lackey

San Diego Padres : National League

L.A. Angles : American League

ERA: 2.54 ERA: 3.01

ERA: Earned Run Average. Mean number of runs yielded per 9 innings

• American league allows Designated Hitter (DH) for pitcher

• National league does not allow Designated Hitter (DH) for pitcher. Pitcher must bat.

Page 6: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Which Variable is the Best Predictor of the Winning Percentage?

Copyright 2017 Dr. Ash Pahwa 6

Page 7: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Pythagorean Theorem� Used in Baseball. Proposed by Bill James� Suppose

� ‘F’ represents a team’s run scored� ‘A’ represents team’s runs allowed

� 𝑃𝑦ℎ𝑡𝑎𝑔𝑜𝑟𝑒𝑎𝑛 𝑊𝑖𝑛𝑛𝑖𝑛𝑔 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = 𝐹2

𝐹2+𝐴2

� It is called Pythagorean theorem because it is similar to the elementary geometry theorem

Copyright 2017 Dr. Ash Pahwa 7

Example:� Year 2012: Detroit Tigers

� Scored Runs = F = 726� Allowed Runs = A = 670

� 𝑃𝑦ℎ𝑡𝑎𝑔𝑜𝑟𝑒𝑎𝑛 𝑊𝑖𝑛𝑛𝑖𝑛𝑔 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = 𝐹2

𝐹2+𝐴2= 7262

7262+6702= 527,076

975,976= 0.54

� Total games won = 162*0.54 = 88 games

Page 8: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

How to Convey Information to Decision Makers

Copyright 2017 Dr. Ash Pahwa 8

Data Scientist

• Understand the Sport• Understand the Players• Understand Performance data

Decision Makers

• Management• Operational Personals

Raw Data Meaningful Results

Selection of • Statistical Models• Predictive Models

Page 9: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Goals of Sports Analytics1. Apply Statistical Models to Sporting

Data2. Ratings and Rankings3. Predictive Models4. Player and Team Assessment

Copyright 2017 Dr. Ash Pahwa 9

Page 10: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Statistical ModelsPredictive Models

Copyright 2017 Dr. Ash Pahwa 10

Indices of Central

Tendencies and Variability

Statistics Used to Examine

Relationships

Inferential Statistics

Ratings + Rankings

Predictive Models

Histogram (Frequency Distribution)

Normal Distribution (z-values and

p-values)

Normality Ratings + Rankings

Simple Linear Regression

Mean Covariance Outlier Rank Aggregation Multiple Linear Regression

Median Correlation –Pearson

t-test Polynomial Regression

Mode Rank Correlation –Spearman

ANOVA Logistic Regression

Range, Variance Partial Correlation Chi-Square

Standard Deviation

Page 11: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Ranking of Players and Teams?

Copyright 2017 Dr. Ash Pahwa 11

Page 12: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Pair-Wise Comparison

Copyright 2017 Dr. Ash Pahwa 12

Page 13: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Pair-Wise ComparisonThe Social Network

Copyright 2017 Dr. Ash Pahwa 13

Page 14: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Pair-Wise ComparisonCan be Used for Ranking

Copyright 2017 Dr. Ash Pahwa 14

Sorting the Rating list produces Ranking

Page 15: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Log 5 Method� Developed by Bill James in 1970s� Computes the probability that Team A

will beat Team B� Log 5 formula has nothing to do with

the mathematical function ‘Log’

Copyright 2017 Dr. Ash Pahwa 15

Page 16: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Log 5 Formula� Suppose Team A true winning percentage is 10 out of 16 games

� Percentage of true winning = 𝑝𝑎 = 10/16 = 0.625� Suppose Team B true winning percentage is 7 out of 16 games

� Percentage of true winning = 𝑝𝑏 = 7/16 = 0.438� --------------------------------------------------------------------------------------� The probability that Team A will beat Team B� 𝑝𝑎,𝑏 =

𝑝𝑎−𝑝𝑎∗𝑝𝑏𝑝𝑎+𝑝𝑏−2∗𝑝𝑎∗𝑝𝑏

= 0.625−0.625∗0.4380.625+0.438−2∗0.625∗0.438

= 0.625−0.2741.063−2∗0.274

= 0.3510.515

= 0.681

� ----------------------------------------------------------------------------------------� The probability that Team B will beat Team A� 𝑝𝑏,𝑎 =

𝑝𝑏−𝑝𝑎∗𝑝𝑏𝑝𝑎+𝑝𝑏−2∗𝑝𝑎∗𝑝𝑏

= 0.438−0.625∗0.4380.625+0.438−2∗0.625∗0.438

= 0.438−0.2741.063−2∗0.274

= 0.1640.515

= 0.318

� ---------------------------------------------------------------------------------------------------� 𝑝𝑎,𝑏 + 𝑝𝑏,𝑎 = 1

Copyright 2017 Dr. Ash Pahwa 16

𝑝𝑎,𝑏 =𝑝𝑎 − 𝑝𝑎 ∗ 𝑝𝑏

𝑝𝑎 + 𝑝𝑏 − 2 ∗ 𝑝𝑎 ∗ 𝑝𝑏

Page 17: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Arpad Elo� Physics Professor at Marquette

University� Milwaukee, Wisconsin

� Chess Player� Devised a method to rank chess players� His method was adopted by

� US Chess Federation� World Chess Federation

17Copyright 2017 Dr. Ash Pahwa

Page 18: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Points Gained or LostPoints gained/lost by player A = Points gained/lost by player B

• Before the game• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵• 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑑𝐴𝐵 = 𝑟𝐴 − 𝑟𝐵

• After the game• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴′

• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵′

• 𝑟𝐴 + 𝑟𝐵 = 𝑟𝐴′ + 𝑟𝐵′

• 𝑃𝑜𝑖𝑛𝑡𝑠 𝑔𝑎𝑖𝑛𝑒𝑑/𝑙𝑜𝑠𝑡 𝑏𝑦 𝑝𝑙𝑎𝑦𝑒𝑟 𝐴 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝑝𝑙𝑎𝑦𝑒𝑟 𝐵• 𝜇𝐴𝐵 = 𝐿 𝑑𝐴𝐵

400= 1

1+10−𝑑𝐴𝐵400

18Copyright 2017 Dr. Ash Pahwa

Page 19: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Example

� After the game� 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴′ = 𝑟𝐴 + 𝐾 𝑆𝐴𝐵 − 𝜇𝐴𝐵 = 2400 + 32 1 − 0.91 = 2403� 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵′ = 𝑟𝐵 + 𝐾 𝑆𝐵𝐴 − 𝜇𝐵𝐴 = 2000 + 32 0 − 0.09 = 1997

� 𝑟𝐴 + 𝑟𝐵 = 𝑟𝐴′ + 𝑟𝐵′

� 2400 + 2000 = 2403 + 1997

If Player A wins

Draw If Player B wins

S(AB) 1 0.5 0

S(BA) 0 0.5 1

Suppose K = 32 for Chess

• Before the game• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴 = 2400• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵 = 2000• 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑑𝐴𝐵 = 𝑟𝐴 − 𝑟𝐵 = 400• 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑑𝐵𝐴 = 𝑟𝐵 − 𝑟𝐴 = −400

If Player A wins

𝜇𝐴𝐵 = 0.91𝜇𝐵𝐴 = 0.09

• After the game• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴′ = 𝑟𝐴 + 𝐾 𝑆𝐴𝐵 − 𝜇𝐴𝐵 = 2400 + 32 0 − 0.91 = 2371• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵′ = 𝑟𝐵 + 𝐾 𝑆𝐵𝐴 − 𝜇𝐵𝐴 = 2000 + 32 1 − 0.09 = 2029

• 𝑟𝐴 + 𝑟𝐵 = 𝑟𝐴′ + 𝑟𝐵′

• 2400 + 2000 = 2371 + 2029

If Player B wins

19Copyright 2017 Dr. Ash Pahwa

Page 20: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

The Social NetworkElo Formula was Used to pair-wise Comparison of Girls

Copyright 2017 Dr. Ash Pahwa 20

Page 21: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports

Copyright 2017 Dr. Ash Pahwa 21

Page 22: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports� Inherent part of Human Culture� Sports competition dates back to the dawn of

our species� Greek Olympics : 776 BC

Copyright 2017 Dr. Ash Pahwa 22

Page 23: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sportsman Spirit� Virtues of sports

� fairness� self-control� courage� persistence

� It has been associated with interpersonal concepts of treating others and being treated fairly

� Maintaining self-control if dealing with others, and respect for both authority and opponents

� GOOD SPORTSMEN HAVE ALWAYS BEEN HELD IN HIGH ESTEEM

Copyright 2017 Dr. Ash Pahwa 23

Page 24: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Most Popular Sports in the World

Copyright 2017 Dr. Ash Pahwa 24

Page 25: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Most Popular Sports in America

Copyright 2017 Dr. Ash Pahwa 25

Page 26: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports Analytics

Copyright 2017 Dr. Ash Pahwa 26

Page 27: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Predictive Models� Estimation

� Regression� Classification (win/loss)

� Logistic Regression� Discriminant Analysis

� Linear� Quadratic

� Support Vector MachineCopyright 2017 Dr. Ash Pahwa 27

Page 28: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Goals of Sports AnalyticsPlayer� Discovering hidden talent in a new player� Player Evaluation

� Assessing Player Performance� Which metrics is most important to

assess a players’ performance� Assessing Player Value� How much value a player adds to the

teams’ value

Copyright 2017 Dr. Ash Pahwa 28

Page 29: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Goals of Sports AnalyticsTeam

� Ranking top teams� Accessing Team Performance

� How to compute the value of a team� Which Team Members are best suited to play

against the opposing team� Which strategy to use to play against a team?

� Anticipating Opponents Behavior� Accessing the probability of a win in a sporting

eventCopyright 2017 Dr. Ash Pahwa 29

Page 30: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Need for Prediction Results� Betting on an sporting event

� People betting on sports need to see the prediction results

� Probability of a win� Point Spread

� Fantasy Sports� DraftKings� FanDuel

Copyright 2017 Dr. Ash Pahwa 30

Page 31: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Applications of Sports Analytics

Copyright 2017 Dr. Ash Pahwa 31

Page 32: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Application of Sports PAMovies

Copyright 2017 Dr. Ash Pahwa 32

The film is based on Michael Lewis' 2003 nonfiction book of the same name, an account of the Oakland Athletics baseball team's 2002 season and their general manager Billy Beane's attempts to assemble a competitive team.

In the film, Beane (Brad Pitt) and assistant GM Peter Brand (Jonah Hill), faced with the franchise's limited budget for players, build a team of undervalued talent by taking a sophisticated sabermetric approach towards scouting and analyzing players.

They acquire "submarine" pitcher Chad Bradford (Casey Bond) and former catcher Scott Hatteberg (Chris Pratt), and win 20 consecutive games, an American League record.

Page 33: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Team Selection: Moneyball

Copyright 2017 Dr. Ash Pahwa 33

Page 34: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Moneyball – Billy Beane & Paul DePodesta

Copyright 2017 Dr. Ash Pahwa 34

William Lamar "Billy" Beane III (born March 29, 1962) is an American former professional baseball player and current front office executive.

He is the Executive Vice President of Baseball Operations and minority owner of the Oakland Athletics of Major League Baseball (MLB).

The character of Brand is an invention by the filmmakers; in the excellent Michael Lewis non-fiction book upon which the movie is based, the real-life “Brand” is identified as Paul DePodesta.

Unlike Brand, DePodesta is slender, fit and handsome. He’s also Harvard-educated (not a Yalie – screenwriter Aaron Sorkin‘s private joke).

Page 35: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Application of Sports PABoston Red Sox

� Using Predictive Analytics Strategies Boston Red Sox won 3 world series in Baseball

Copyright 2017 Dr. Ash Pahwa 35

In 2006, Time named Bill James in the Time 100 as one of the most influential people in the world. He is a Senior Advisor on Baseball Operations for the Boston Red Sox.

Page 36: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports Analytics Literature

Copyright 2017 Dr. Ash Pahwa 36

Page 37: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Predictive Models for SportsLiterature

Copyright 2017 Dr. Ash Pahwa 37

Page 38: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Statistical Learning� Gareth James� Daniela Witten� Trevor Hastie� Robert Tibshirani

� Stanford University

Copyright 2017 Dr. Ash Pahwa 38

Page 39: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data Sources

Copyright 2017 Dr. Ash Pahwa 39

Page 40: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data Sources

Copyright 2017 Dr. Ash Pahwa 40

ValidAccurateCompleteContains derived variables

Page 41: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data Sources� www.NFL.com� www.NBA.com� www.footballOutsiders.com� www.pro-football-reference.com� www.soccerstats.com� www.basketball-reference.com

Copyright 2017 Dr. Ash Pahwa 41

Page 42: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data SourceFootball� www.ArmChairAnalysis.com

Copyright 2017 Dr. Ash Pahwa 42

Page 43: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sports Predictive Models

Copyright 2017 Dr. Ash Pahwa 43

Page 44: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Goals of Predictive Analytics Application: Estimation or Classification

� Estimation – Regression modeling technique is used� Output is a number

� House price� Product sales for next

quarter� GNP growth for the next

quarter� How many points a team

will score

Copyright 2017 Dr. Ash Pahwa 44

� Classification � Logistic Regression� Support Vector Machine� Discriminant Analysis (Linear,

Quadratic)� Naïve Bayes, Decision Trees etc.

modeling techniques are used� Output is a categorical variable

� Sports team will win or lose� Email is junk or not� Which grade student will get� Tweet is positive or negative

Page 45: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Common PA Techniques

� Regression� Linear 2 variables� Linear multi variables� Logistic� Polynomial

� Clustering� Decision Trees� Neural Networks� Naïve Bayes� ARIMA� A few more …

45Copyright 2017 Dr. Ash Pahwa

Page 46: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Regression Model

Copyright 2017 Dr. Ash Pahwa 46

Page 47: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

History of Linear Regression

� Sir Francis Galton and � Karl Pearson� Developed the concepts on

Regression and Correlation in 1900 - 1930

47Copyright 2017 Dr. Ash Pahwa

Page 48: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Definition: Linear Regression� 2 Variable Regression

� How a response variable “y” changes� As the predictor (explanatory)

� variable “x” changes� Multiple Regression

� How a response variable “y” changes� As the predictor (explanatory)

� variables “x1”, “x2”, … “xn” change

48

cxy � 1E

cxxxxy nn ����� EEEE ...332211

Copyright 2017 Dr. Ash Pahwa

Page 49: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Single Variable Polynomial RegressionFirst degree to Fifth Degree

� The 5th degree polynomial regression looks good for training set data

� But with test data it may not perform well� PROBLEM: Over fit for this data

Copyright 2017 Dr. Ash Pahwa 49

Page 50: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Overfitting� If there exists a model with estimated parameters 𝑤′ such that

� Training error (w2) < Training Error (w1)� Generalization (Test) error (w2) > Generalization (Test) Error (w1)

Copyright 2017 Dr. Ash Pahwa 50

Page 51: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Bias and Variance

� Goal is to find out a model complexity where� Generalization (Validation) errors are least� Bias + variance are least

� 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐵𝑖𝑎𝑠2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒� Just like Generalization Error

� We cannot compute Bias and Variance

Copyright 2017 Dr. Ash Pahwa 51

Page 52: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

1st Degree Polynomial Fit𝑦 = 𝑎1𝑥 + 𝑐

Copyright 2017 Dr. Ash Pahwa 52

> #result = lm(y1~x)> result = lm(y1 ~ poly(x,1,raw=TRUE))> summary(result)

Call:lm(formula = y1 ~ poly(x, 1, raw = TRUE))

Residuals:Min 1Q Median 3Q Max

-0.9872 -0.7277 -0.1394 0.7842 1.2692

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.1919 0.4285 0.448 0.662poly(x, 1, raw = TRUE) -0.0492 0.1120 -0.439 0.668

Residual standard error: 0.8449 on 12 degrees of freedomMultiple R-squared: 0.01581, Adjusted R-squared: -0.0662 F-statistic: 0.1928 on 1 and 12 DF, p-value: 0.6684

> p = predict(result,list(x=xPredict))> lines(xPredict,p,col='black',lwd=3)>

Page 53: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

16thDegree Polynomial Fit𝑦 = 𝑎1𝑥16 + 𝑎2𝑥15 + …+ 𝑎15 𝑥2 + 𝑎16𝑥 + 𝑐

Copyright 2017 Dr. Ash Pahwa 53

> result = lm(y1 ~ poly(x,16,raw=TRUE))> summary(result)Call:lm(formula = y1 ~ poly(x, 16, raw = TRUE))Residuals:ALL 14 residuals are 0: no residual degrees of freedom!Coefficients: (3 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)(Intercept) 1.820e-01 NA NA NApoly(x, 16, raw = TRUE)1 1.280e+02 NA NA NApoly(x, 16, raw = TRUE)2 -7.656e+02 NA NA NApoly(x, 16, raw = TRUE)3 1.973e+03 NA NA NApoly(x, 16, raw = TRUE)4 -2.895e+03 NA NA NApoly(x, 16, raw = TRUE)5 2.675e+03 NA NA NApoly(x, 16, raw = TRUE)6 -1.631e+03 NA NA NApoly(x, 16, raw = TRUE)7 6.712e+02 NA NA NApoly(x, 16, raw = TRUE)8 -1.869e+02 NA NA NApoly(x, 16, raw = TRUE)9 3.447e+01 NA NA NApoly(x, 16, raw = TRUE)10 -3.944e+00 NA NA NApoly(x, 16, raw = TRUE)11 2.282e-01 NA NA NApoly(x, 16, raw = TRUE)12 NA NA NA NApoly(x, 16, raw = TRUE)13 -5.901e-04 NA NA NApoly(x, 16, raw = TRUE)14 NA NA NA NApoly(x, 16, raw = TRUE)15 1.468e-06 NA NA NApoly(x, 16, raw = TRUE)16 NA NA NA NA

Residual standard error: NaN on 0 degrees of freedomMultiple R-squared: 1, Adjusted R-squared: NaNF-statistic: NaN on 13 and 0 DF, p-value: NA

Page 54: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Under fit or Over fit Model

Copyright 2017 Dr. Ash Pahwa 54

# Degree of

Polynomial

Under fit or Over

fit

1 1 Under fit2 2 Under fit3 4 Under fit4 8 OK5 10 Over fit6 16 Over fit

Page 55: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Lasso Regression

Least Absolute Shrinkage and Selection Operator

Copyright 2017 Dr. Ash Pahwa 55

Page 56: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Cost Function of OLS + Ridge + Lasso

Copyright 2017 Dr. Ash Pahwa 56

OLS

Ridge Regression

Lasso Regression

Page 57: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

NFL Model

Copyright 2017 Dr. Ash Pahwa 57

Page 58: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data Source� Seasons

� 2000 – 2016� Weeks 1 – 21

� Week#1 – #17: 16 games + 1 bye� Week#18: 4 games: Wild card� Week#19: 4 games: Divisional Playoff� Week#20: 2 games: Conference Championship� Week#21: 1 game: Super Bowl

Copyright 2017 Dr. Ash Pahwa 58

Page 59: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Data Tables

� Arm Chair� Game Data� Play Data� Team Data

� External Source� Stadium Data (Wikipedia)� City Coordinates (Wikipedia)� City GDP (Wikipedia + US Govt.)

Arm Chair Data – 26 Tables

Copyright 2017 Dr. Ash Pahwa 59

Page 60: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Sonny Moore’s Computer Ratinghttp://sonnymoorepowerratings.com/archive

Copyright 2017 Dr. Ash Pahwa 60

Page 61: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

USA Today: Jeff Sagarinhttp://www.usatoday.com/sports/nfl

Copyright 2017 Dr. Ash Pahwa 61

Page 62: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

NFL Prediction

Model Result

Copyright 2017 Dr. Ash Pahwa 62

Page 63: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Comparison of ErrorsSonny Moore & Jeff Sagarin

Copyright 2017 Dr. Ash Pahwa 63

Page 64: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Super Bowl 2016 Prediction by Nate Silver

Copyright 2017 Dr. Ash Pahwa 64

Page 65: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Super Bowl 2016 Prediction by A+

Copyright 2017 Dr. Ash Pahwa 65

Page 66: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Super Bowl 2016All Predicted Models were Wrong

Copyright 2017 Dr. Ash Pahwa 66

Panthers BroncosNate Silver 59% 41%

A+ 56.5% 43.5%

Page 67: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

2017 NFL PredictionWild Card : Playoff

� Predictions for all the 4 games were 100% correct

Copyright 2017 Dr. Ash Pahwa 67

Page 68: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

2017 NFL PredictionDivisional Title: Playoff

Copyright 2017 Dr. Ash Pahwa 68

www.NFLPrediction.co

� Predictions � 2 correct� 2 incorrect� 50% correct

Page 69: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

2017 NFL PredictionConference Championship: Playoff

� Atlanta Falcons vs Green Bay Packers� Atlanta Falcons 61%

� New England Patriots vs Pittsburgh Steelers� New England Patriots 70%

Copyright 2017 Dr. Ash Pahwa 69

Page 70: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

2017 NFL PredictionSuper Bowl

� New England Patriots vs � Atlanta Falcons

�New England Patriots 60%

Copyright 2017 Dr. Ash Pahwa 70

Page 71: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Summary� Sports� Sports Analytics� Applications of Sports Analytics� Sports Analytics Literature� Data Sources� Sports Predictive Models� Regression Model� Multi Variable Regression with Lasso� NFL Prediction Model� Prediction for Super Bowl 2016� Prediction for NFL 2017 Playoffs

Copyright 2017 Dr. Ash Pahwa 71

Page 72: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

UCSD Extension Courses

Winter 2017

72Copyright 2017 Dr. Ash Pahwa

Page 73: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

UCSD Extension coursesWinter 2017

73

Winter 2017

Online

Dates

Sports Predicted Analytics(7 weeks)

January 30 – March 19, 2017

Copyright 2017 Dr. Ash Pahwa

Page 74: redictive Analytics: Prediction Model By Dr. Ash Pahwas3.amazonaws.com/sdieee/2086-Sports+Analytics+IEEE+Seminar+1-17-17.pdfm es ose red wed J E J J E J C = C A= ¿2 ¿2 + º2 to the

Course Content

Copyright 2017 Dr. Ash Pahwa 74

UCSD Sports Predictive Analytics

L# Date Subject1 01/30/17 Introduction to Sports Analytics

1.1 What is Sports Analytics?1.2 Tools for Sports Analytics1.3 Science of Learning from Data

1.4 Basic Stat : Data Types + Histograms + Std Dev2 02/06/17 Statistical Methods Applied on Sports Data

2.1 Normal Distribution2.2 Correlation2.3 Rank and Partial Correlation3 02/13/17 Central Limit Theorem + Hypothesis Testing

3.1 CLT + Parameter Estimation3.2 Hypothesis Testing (*)4 02/20/17 Inference Stat: Chi-Sq+ANOVA+Model Selec

4.1 Chi-Square (*)4.2 ANOVA4.3 Stat Model Selection (*)5 02/27/17 Ratings and Rankings + Rank Aggregation

5.1 Ratings and Rankings (Elo System)5.2 Rank Aggregation (Borda Voting)6 03/06/17 Prediction Using Regression Model

6.1 Introduction to Regression6.2 Regression 2 Variables6.3 Regression Multi Variables

7 03/13/17 Prediction Using Logistic Regression Model 7.1 Logistic Regression Mathematics7.2 Logistic Regression Mechanics7.3 Multivariable Logistic Regression