An Econometric Explanation of National Olympic Success · 2018. 5. 15. · sense that the largest...

An Econometric Explanation of National Olympic Success London 2012

Ben Shank

ABSTRACT

This paper will explore the economic factors that impact the relative success that a nation has in the Olympic games. Given the time frame of the writing of this paper, I was particularly intrigued by the apparent domination of large, relatively wealthy nations in the 2014 Sochi Winter Games. Even from passive observation of medal counts, one can see that the nations that perform well at the Olympics are the same countries that control the world in terms of size, money and influence. This causes me to look at the variables of GDP and population.

2

Table of Contents

Introduction ............................................................................................................... 3

Literature Review ....................................................................................................... 4

Theoretical Analysis .................................................................................................... 8

Data Analysis ............................................................................................................ 11

Conclusion ................................................................................................................ 20

Appendix .................................................................................................................. 24

Bibliography ............................................................................................................. 28

3

Introduction

I had hoped to look at Sochi since it occurred at the time I was choosing a

topic; however, I quickly learned that the winter and summer games are very

different. The major difference between Sochi and London is the sheer quantity of

medals available at the summer games versus the winter games. Whereas we have

26 medal winning nations from Sochi, for the London data set, we have 85 nations

represented. There were 962 total medals awarded in London, while only 295 were

awarded in Sochi. This gives us a much more workable data set since we have a

larger sample size. With the level of observations, we are much more comfortable

interpreting the t-statistics like we would z-statistics because with a high “n” we see

that our empirical distributions will approach the normal curve distribution. This is

the first benefit of being able to look at the London data.

The second benefit of looking at a summer games is the increased level of

diversity in athletic competitions. While the Dutch were able to take a substantial

medal share due to their sheer dominance in a relatively narrow event group (speed

skating), we do not expect to see this same level of specialization in summer games.

Furthermore, the small Alpine nations in Europe have a huge advantage in skiing

events, which account for a large share of the awardable medals in Winter games,

and this leads to a medal concentration for these nations. In this study, with our

data limitations, we struggle to account for the geographic advantages afforded to

the countries that traditionally excel in winter games. The winter games certainly

have more of an emphasis on physical geography in regards to the types of athletics

we see. A simple comparison of the concentration of high-level skiers in Colorado

4

versus the level of high-level skiers in Alabama seems ample to explain what we

mean by the physical geography advantage. The bottom line is that we simply will

not find as many good skiers in a flat southern state such as Alabama as we will in a

mountainous state like Colorado. (Of course here we are ignoring the huge

demographic differences between the two states; Alabama is extremely poor

compared to Colorado. Luckily, we account for this in our Olympic regressions!)

Literature Review

This review will narrow in focus as we move from the broadest indicators for

Olympic success to the more peculiar and outlier type factors. Obviously, the “Big

Two” indicators are GDP and population. The following studies provide an outline

of the best way to compile data. The goal is to update this data with results from the

recent games in 2012.

Bernard and Busse examine the relationship between both economic factors

and population size in determining Olympic success. Their model attempts to

account for population by framing Olympic success as the percentage of medals that

a country wins at the games. They examine the medal totals of the Games from

1960-1996, as well as provide their forecast for the 2000 Sydney Games. It would

be very intriguing to look at how their predictions fared now that we can

retroactively look at the data. They postulate, “at the margin, population and high

per capita GDP are needed to generate high medal counts” (Bernard 413). Another

element that Bernard and Busse examine is the role that political attention has on

Olympic success. For example, why have former USSR outperformed their predicted

model results based on population and GDP per capita? The answer lies in the

5

political landscape of these nations; as we saw with Putin and the 2014 Sochi

Games, the Eastern Bloc nations apparently place a heavy emphasis on Olympic

success and allocate resources accordingly. While Bernard and Busse do not

attempt to quantify this allocation, they do use an error term to absorb this source

of variability (Bernard 415). They do, however, attempt to explain the Olympic

investment as a function of lagged medal count. This means that past success

implies future success and that this can be traced to investment. They argue that

this addition improves the fit of the model (Bernard 415).

Their model is essentially a function of population size, GDP per capita and

an error term (Bernard 415). I think it would be interesting to attempt to come up

with a variable that quantifies the size of the sports market in countries. This is very

difficult in lower income countries, but it could be an interesting point of

comparison between some of the Olympic powerhouse nations like the US, Russia,

Germany, and China. “Developing Olympic caliber athletes requires considerable

expenditure on facilities and personnel” (Bernard 414). An interesting part of the

Bernard and Busse study is their use of the log function transformation on

population and GDP per capita. They also find that the host nation typically

outperforms the estimated results from the model by about 1.8%. Perhaps the most

intriguing finding of the study is that total GDP is the crucial determinant in

predicting medal counts. “This suggests that it is a country’s total GDP that Matters

in producing Olympic athletes. This in turn has the implication that two countries

with the same GDP will win approximately the same number of medals, even if one

is more populous with lower per capita income and the other is smaller with higher

6

per capita GDP. Furthermore, there is strong evidence for durability to a country’s

Olympic investments” (Bernard 417).

Morton examines the 2000 Sydney games in a similar fashion to Bernard and

Busse. However, instead of focusing on overall medal counts, he attempts to weight

medals so as to determine a “winner” (Morton 147). He builds a model of GDP in

millions of USD and populations in millions similar to the model of Bernard and

Busse. The next goal was to measure the residuals as a measure of outperformance

based on the countries’ respective economic and population data. He postulates the

Cuba is the ultimate positive outlier and that India vastly underperformed. We once

again see use of the natural log on population and GDP to improve the goodness of

fit of the data (Morton 148). We include this study given its focus on over

performers. The next study is helpful in explaining why one of these countries

performed the way it did.

While Bernard and Busse present us with the kind of comprehensive model

we want to emulate, Krishna and Haglund present a case study of India’s Olympic

success (or lack thereof) in comparison to other nations. They question why India’s

one-sixth share of world population and rapidly growing economy has not

translated into even mediocrity at the Olympic Games. India, in comparison to even

the poorest countries, had the lowest medal per citizen ratio in the 2004 Sydney

Games (Krishna 143). Nigeria, Cuba, and Thailand outperformed India. Given

India’s total GDP (and modestly respectable growing GDP per capita), why have they

performed so poorly at the Games? “Why do 10 million Indians win less than one-

hundredth of one Olympic medal, while 10 million Uzbeks won 4.7 Olympic

7

medals?” (Krishna 143). The clear point of this paper (it references Bernard and

Busse) is to figure out why India is an outlier in terms of underperformance at the

Games based on its GDP and population.

The explanation for this underperformance comes from what Krishna and

Haglund refer to as the effectively participating population rather than the total

population. The premise of this concept is based on the idea that Olympic athletes

are not coming from the portion of the population that is uneducated, malnourished

and socially discriminated against (Krishna 144). Thus, the belief is that the greater

the proportion of the population that has access to these resources, the greater the

amount of Olympic caliber athletes and therefore Olympic medals for a given nation.

The following passage reflects this logic, “That one billion Indians together won only

one Olympic medal seems otherwise hard to explain. Any explanation based on race

or genetic characteristics seems facile simply on account of the immense diversity

found in India. But if a vast majority of Indians are not effective participants,

possibly because information about these events is available to a tiny number - and

a tinier number yet know where and how to avail themselves of these opportunities

- then a more complete explanation for poor performance comes to hand” (Krishna

144).

What the Krishna and Haglund adds to our research is a level of economic

characterization beyond that of just personal wealth as indicated by GDP per capita.

They include indicators that reflect the standard of health, education and a measure

they call “connectedness”. Connectedness is meant to reflect the access to

information and thus resources necessary to form the effectively participating

8

population. The variables for connectedness are “road length per unit of land area,

share of urban population, and radios per capita” (Krishna 144). I believe that it is

crucial to include findings from this study in research due to its attention to one of

the most peculiar outliers. In my study, I am limited to a GDP and population model

although I feel that the model fails to tell the entire story and that much can be

learned from those nations that significantly outperform or underperform based on

their respective “Big Two”. (The Big Two factors are GDP

per capita and population.)

Theoretical Analysis

The key determinants in Olympic success are GDP per capita and population.

If we measure Olympic success by the total medal count, then intuitively it makes

sense that the largest nations will accumulate the most medals. The more athletes

that a given country sends to the games, the more chances they have to earn medals.

Also, with a larger population, a nation is able to pull from a more diverse group of

athletes and has not only a larger talent pool, but also has the ability to specialize

across disciplines given the large number of athletes. The US is a prime example of

this phenomena. Americans have dominated the games for years in all sorts of

events, from swimming to shooting to speed skating. This comes from our large

population and the diversity of sports focus across the nation.

It helps to think of the large population advantage like we would high school

sports. As a graduate of a tiny private school in Fort Wayne, Indiana, I definitely got

a different view of high school sports than your typical Midwestern teenager. My

school lacked the sheer size to field a football time. With only 87 students in my

9

graduating class, we simply did not have the number of students necessary to

develop a football program. Lacking the “sports infrastructure” simply kept us off of

the playing field and led our athletically inclined students to pursue other interests.

This same comparison can be made at a country level. Just as Canterbury High

School struggled to find the manpower necessary to field a human capital heavy

sport like football, small countries too will struggle to find the human resources to

develop enough athletes to take a substantial portion of the total medal count.

While Canterbury lacked a football team altogether, the size of the school and

focused efforts did create an environment conducive to specialization in specific

sports. Think of the small mountainous European nations that dominate skiing

events in the Olympics. Think of the Dutch in speed skating. Canterbury used the

same methods; while we couldn’t compete in the sports requiring tons of people ie

track, football, wrestling, Canterbury excelled in tennis, golf, and soccer. These

sports require substantial skill development and given the location, size and

demographic factors, Canterbury was able to defeat much larger schools in these

arenas.

In addition to not competing, Canterbury also lacked the ability to keep up

with large schools in terms of talent. When playing a large high school of 4,000

students, there was a very evident talent gap between Canterbury and a school such

as Carmel. The sheer size of the schools allowed them to develop and pull talented

athletes from a much larger population. However, unlike high school sports, the

Olympic games are not stratified by population size. Canterbury obviously

struggled to keep pace with the behemoth high schools in the same fashion that a

10

small nation will struggle to pump out Olympic medals as compared to giants like

Russia, the US and China. The allegorical use of high school sports helps to explain

the theoretical basis for deeming population size as one of our key indicators in

determining Olympic medal count success.

GDP per capita is the other main determinant for medal count success at the

Olympics. Athletic development costs money first and foremost. It costs a lot of

money to build stadiums, purchase equipment, and pay for coaching to name a few

explicit costs. Aside from these explicit costs as determinants for athletic success,

we also need to think about the countries at the margin. Imagine the typical family

in a poor country. This family’s primary concerns, rather than positioning children

for athletic success will be primarily concerned with allocating resources (time and

money) towards procuring food for the children. Getting their son or daughter on

the best travel soccer team or getting advice from the best (which typically implies

expensive) figure skating coach in the area is the least of a poor parent’s worries.

Simply, the lack of time and money to employ toward the development of athletics is

a primary factor for understanding why GDP per capita is a key determinant.

In addition to the simple allocation of funds towards sports, we can also use

GDP per capita as a representative of total health in the country. By almost all

measures, and 500 other papers could be written on this topic alone, wealthier

nations (on a per capita basis) are simply healthier nations. A nation cannot hope to

develop Olympic athletes if it is plagued by malnutrition, infant mortality, and

infectious disease. The above factors will simply wipe out the available athletes.

Malnutrition especially will prevent millions of young people from developing

11

physically at the rate and efficiency that the average American or Western European

will develop. Although India has over 1.2 billion people, the fact that it has one-third

of the world’s malnourished children gives a glaring explanation for its lack of

Olympic success (Krishna 145). There are worse circumstances of malnutrition

than a lack of Olympic success but this is an example of another reason for the

theoretical inclusion of GDP per capita as an indicator.

Data Analysis

This paper employs Ordinary Least Squares methods in estimating

coefficients on the key independent variables, which determine the dependent

variable, total medal count. In this analysis, I will explore the results of the 2012

London Summer Games. This type of analysis has been done several times before

on past Olympic games, but I believe there is much to be gained to by investigating

the determining factors from the most recent summer games.

The variables I use are percentage of medal share, which is simply a

country’s total medals won, divided by the overall medals that were awarded at the

games. Medal share is our dependent variable and we will then look at the impact

that our independent variables have on a country’s medal share in the games. The

data set originally contained economic data on 204 countries, which is effectively

every nation in the world. We run two data sets to look at the fit. In one regression,

we take the 204 countries, all of which had at least one representative at London.

However, given that some countries really aren’t major players in the sense that

they sent maybe one athlete, we attempt to account for this discrepancy by

dropping countries that failed to win a medal. Dropping all these countries is no

12

doubt an issue for our data analysis, but it proves useful so that we look at the

“athletically relevant” countries that competed for the purposes of this analysis.

I began trying to use total medal count as the primary dependent variable;

however, after running several OLS regressions with total medal count versus medal

share percentage, I found that the R-squared values, a representative statistic of the

goodness of fit of a line demonstrated

For the London 2012 Summer Games, we have the total medal counts, the

breakdown of medal counts, population and GDP data. (The model equations can be

seen in the appendix and in Table 1). In our regressions, we include a random

variable in order to account for chance error since we are dealing with random

variables. The error term (represented by epsilon in the Model Equations portion of

the Appendix) also absorbs the effect of omitted variable bias. When attempting to

predict Olympic success, it is important to remember that there are more variables

at play than simply wealth and size of the nation. Examples of this include a nation’s

geography, climate, political situation, etc. Any number of combinations of these

factors influences the relative Olympic success of a particular nation. Some of the

trends seen in the Bernard and Busse study involve the political affiliations of

traditionally successful nations like Russia and China. They look through time at

these countries’ success and use a dummy variable to acknowledge the apparent

influence of communism on Olympic success. The coefficients on the dummy

variable were highly statistically significant. This provides one example of why we

use the dummy variable to absorb the influence of these values we fail to explicitly

measure in our regressions.

13

When running OLS regression, we are assuming that the Classical

Econometric Model has not been violated. One of these conditions is that we must

have a homoscedastic distribution of errors. This means that the distribution of the

errors do not vary at different values of our independent variables. However, after

seeing from a Breusch-Pagan test that there is evidence of heteroskedasticity (as

shown by p-values near 0), I made the choice to report robust SEs since they

account for heteroskedasticity.

Winners Only

First we look at the GDP per capita is a primary dependent variable after

dropping all nations that failed to bring home a medal. In Model 1, we run a pure

regression with this variable on medal share percentage. We put this in units of

$1000s for readability’s sake and due to the fact that an increase of $1 in average

GDP per capita in a nation probably will not “move the lever” on Olympic success.

This first regression gives us an estimated slope coefficient of .00014. This implies

that as GDP per capita increase by $1,000, there will be an expected increase in

medal share count at the Sochi Winter games of about .014% give or take .0095%.

This value is not statistically significant with a t-value of p-value of 14%.

In Model 2, we take the natural log of GDP per Capita. The coefficient

therefore implies that a 1% increase in GDP per capita is associated with a 0.031%

increase in medal share count. This value is not quite statistically significant with a

t-statistic of 1.93, which corresponds to a p-value that is effectively 5.7%. Given the

drop in R-squared value from .026 in Model 1 to .043 in Model 2, we have reason to

believe that the linear data fits better than the natural log transformed data.

14

Model 3 brings population size into the mix. As we discussed above,

population is a definite indicator of how many medals a country will win in a given

year. Model 3 is a pure regression of medal share on population and GDP per capita,

which means that we make no transformations on the data. We interpret the GDP

per capita coefficient of .00019 the same way that we do in Model 1. We interpret

the population coefficient as a 10 million person increase in population will be

associated with a .00048% increase in medal share percentage. Model 3 is definitely

an improvement over Model 2 in that it accounts for the sheer number of athletes in

a country and this yields an R-squared value of .25. This model demonstrates the

effects of omitted variable bias, which in this case stem from our neglect of

population as a determinant in predicting medals won in the earlier models. This

can be seen by the decrease in the estimated slope coefficient on GDP per capita,

which falls since we do not attribute every movement in medal count to GDP per

capita.

Model 4 employs semilog model techniques in that we take the natural log of

our two independent variables: population and GDP per capita. This results in us

interpreting the population coefficient as a 10% increase in population size is

associated with .067% in medal share for a given nation. A 10% increase in GDP per

capita yields a .049% in the share of medals won at the 2014 Sochi games. This

regression yields a much better R-squared value to that which we find in Model 3

with .37.

I initially believed that Model 5 would serve as an improvement over Model

4. I thought that undoing the logarithmic transformation on GDP per Capita, but

15

leaving it on population would improve the fit of our data. However, when we run

the regression, we actually find that the R-squared value falls to .32. This provides

further evidence that the logarithmic transformation to GDP per Capita is in fact

appropriate on the basis of fit.

All Participants

It would be mind-numbing to repeat the interpretations of each coefficient in

each model since all that we did was include the 119 (204-85) non medal winners in

this second set of regressions found in Table 2. So in sparing both the readers and

myself, we will discuss the implications of this second set. When we run the

regressions, we quickly realize that across the board, our estimated coefficients

have fallen. This can be seen in a side-by-side comparison of Table 1 and Table 2;

within the first glance, we see that the values are smaller. In this set of regressions,

we also find that our R-squared values have fallen. We attribute this to the greatly

increased number of observations from 85 to 206. As we add back in a bunch of non

medal winners with actual medal share equal to 0%, we recognize that our fitted

OLS regression will not fit as well. Even with some marginal income and simply a

population, most countries will break into the realm of which we expect them to

garner at least some portion of the medals. What this means is that we predict that

many countries will win some, however small, portion of the medals available. So

every time one of these non-winners posts a 0 in medal share, we poorly estimate

the medal share count, which is reflected in the noticeably smaller R-squared values

across the board.

16

By including the non-winners, we quickly realize the implications of our

lower bound value of 0. While this leads to headaches in our plain vanilla OLS

regressions, we do have a valuable resource to alleviate this headache (thank you to

Dr. Howland). This resource is tobit analysis; tobit analysis is used in the cases in

which the data set has a definite upper and lower bound. Since medal share is a

percentage, this implies that there is a lower bound of 0% and an upper bound of

100%. While we don’t really worry about the upper bound, I do not think that even

the most spirited USA! chants will push us to the point of winning every single

medal at the Olympics (100% medal share), we are very aware of the lower bound

value of 0%. We are so aware of this value that we dropped 121 observations for

one of our sets of regressions.

Since we already have somewhat of an idea that Model 4, with its semilog

functional form, is the best fitting regression, we will conduct tobit analysis on

Model 4 alone. The tobit analysis allows us to account for this lower bound value

instead of altogether dropping these values like we do in the Winner’s Only analysis.

The results are shown in the Tobit column of Table 2. We see from the results that

our slope coefficients on both GDP per Capita and population are significantly

greater than in the other two sets. Since both variables have a logarithmic

transformation we interpret the coefficients just as we would in a semilog model.

Thus, we associate a 10% increase in GDP per Capita with a .89% increase in medal

share count. Given that there are 962 available medals, we associate this 10% GDP

per Capita increase with an increase in total medal count of about 8.5 medals. So if a

nation wants to increase its medal count by 8.5 medals, they should plan to increase

17

the general wealth of the nation by about 10%. Similarly for population, we

associate a 10% increase in population with about an increase in medal share count

of about .82%. This implies that increasing a nations population by 10% will cause

us to predict them winning about 7.9 more medals in the 2012 Olympics. The key

difference we see with the tobit model is in magnitude of these slope coefficients.

This discrepancy is especially evident when we view Graph 1 that was

generated in Stata. On the y-axis, we measure the predicted medal share of the

nation. On the x-axis, we have the natural log of the population. The red marks

represent the values predicted by the tobit model and the blue marks represent the

linear predictions generated by OLS regression. Several things jump out at us by

viewing the graph. First, we witness the lower boundedness evident in OLS

regression by noticing that the blue marks do not really fall below 0 (except in some

extreme cases when the negative constant value overwhelms the independent

variable values of GDP per Capita and population). Second, we notice that GDP per

Capita does not explicitly appear on the graph. While it is not measured on an axis,

we interpret the relative location of the marks in terms of vertical distance as

measures of GDP per Capita. Perhaps the best way to demonstrate this is to look at

China and India. In both the tobit and OLS models, China and India are represented

by the marks on the far right of the graph due to their massive populations of over 1

billion people. However, even though we see that the they are roughly the same

size, China’s mark is substantially higher on the y-axis due to the models’

predictions that China will win a greater share of the medal count. This difference in

vertical difference is effectively the difference in GDP per Capita as implied in the

18

models. Thus, although China and India are about the same size, our models predict

that China will win more medals than India due to China’s higher GDP per Capita.

Finally, we see that the red tobit values have a much steeper slope than the

flatter linear regression values. This comes from the fact that we have accounted for

all the non medal winners by effectively weighting them in a manner that says, if

mathematically possible, these countries’ economic indicators are so poor for

Olympic success that they would actually have a negative medal share count.

Clearly, a country cannot win a negative share of medals, but if there were some way

to index Olympic success based on these predictions, we could measure just how

poorly these lower end countries performed. Essentially, the 0% medal share is like

a stop loss in that the y realy would be worse if we had a better indicator of their

athletic prowess. Of course, until we start living in a more perfect world, this is not

really possible given the limitations of math and statistics. However, once we reach

this utopia, econometrics will not be taught because empirical research will

perfectly match theoretical predictions.

In conclusion, we find that the tobit semilog of GDP per Capita and

population is the best model. As we discussed above, the tobit model allows to

account for the lower boundedness of a medal share better than dropping all of the

non winners. Simply taking all of the nations represented gives us too weak of

values in the magnitude of slope coefficients that are roughly one-third of those in

the tobit model and almost one-half of those in the dropped losers model. These

stronger values give us more confidence in predicting values.

19

Table 3 provides us with a list of some of the interesting countries and their

actual medal share, linear predicted medal share and tobit medal share. The table

provides us with a more concrete idea of how well these predictors do. I used

several outliers in the table, which makes both models look worse than they actually

are. Clearly, we see that in the case of countries that outperformed the models (US,

China, Russia), that the tobit model was much closer in predicting the actual medal

share than the simple linear regression. Also, in looking at the table we see an

example of a negative tobit prediction value. Haiti, with all of its issues, is predicted

by the tobit model to actually earn a negative share of the medals. While this is of

course not possible in practice, it illustrates the point we made earlier about

boundedness and can also be seen in Graph 1. I include Canada simply because it

performs almost exactly as predicted by linear model and slightly underperforms

the tobit model. One interesting variable that I did not account for, but that past

studies have done is to include a dummy variable for the host nation. The Bernard

and Busse study looks at the games over time and includes a dummy variable for

host nations. They conclude that being the host nation gives a substantial bump and

that nations are expected to take roughly 4% more of the medal share had they not

been the host. This factor could help to explain the outperformance of the UK since

they hosted the games and had roughly 4% more of the medal share than predicted

by the tobit model.

Previous Work and Possible Errors

My data appears to be roughly consistent with prior economists’ work. As

expected, we find that GDP per Capita and population are highly relevant factors in

20

predicting Olympic success. This has been demonstrated through our multiple

regressions using both OLS and tobit regressions as well as natural logarithmic

transformations. I effectively replicated the Bernard and Busse study in that I used

tobit regression techniques on the natural logarithmic transformations of GDP per

Capita and Population in order to estimate medal share for the 2012 London

Olympics. With their more robust resources, they include several other variables

that I wish I could have employed in my data set. Originally, I researched the 2014

Winter Games and noticed that the small alpine nations and the former Soviet Union

nations seriously outperformed the predictions in models. These outliers are

certainly noteworthy and demonstrate the likelihood of exclusion of important

variables that I have mentioned throughout the paper up to this point such as

politics, geography, climate, etc. They also looked at the Olympics through time

whereas I only look at London. Since I only looked at one Summer Games, there is

some worry that my data is not representative of Olympic success as a whole.

Conclusion

The obvious takeaway from this paper is that the wealth and size of a nation

drives its Olympic success. We see this in our models, as the slope coefficients on

GDP per Capita and population are routinely statistically significant. We expected

this result and it seemed elementary to propose a null hypothesis that stated that

these slope coefficients would be equal to 0. If we had done this, we would have of

course rejected the null based on the results of the F-tests of our regressions. The p-

values associated with these F-tests (which test the probability that the slope

coefficients are equal to zero based on chance alone) were very small, which gave us

21

reason to reject the null hypothesis that GDP per capita and population did not play

a role in determining medal share.

Given that our independent variables are highly influential, the next

important goal was finding the best model to estimate the actual results. Using

natural logarithmic transformations to better fit the data was crucial in finding the

best model. While I expected the natural logarithm function to work well on the

population variable, I was surprised to find that the transformation also improved

the fit of GDP per Capita. This implies that the relationship between medal share

and the two independent variables is logarithmic in nature as shown by the

improvement in R-squared values following the transformation.

The crucial development in my analysis was to employ tobit regression

rather than the “standard” OLS regression methods we used in class. The tobit

model provided stronger slope coefficients on the independent variables, which in

turn allowed as to better predict the actual medal shares. Whereas we dealt with

the lower bound value of zero in the linear model by dropping 119 non medal

winners, tobit analysis not only included these values, but used them to better

predict actual results as we see in Table 3.

Future Work

I ran several regressions on the Sochi 2014 data set, which are shown in

Table 4. The dummy variable assigns a 1 to countries that were formerly a part of

the Soviet Union and a 0 to the other medal winners. We do this because former

Eastern Bloc nations have routinely outperformed at the winter games based on

their population size and relatively low GDP per capita. The coefficients on GDP per

22

capita and population and interpreted the same way we have interpreted them in

previous models. We interpret the dummy variable coefficient, which we refer to as

USSR, as: we expect former Soviet countries (indicated by USSR=1) to have a 3.48%

greater share of winter Olympic medals than a non-Soviet country ceteris paribus.

There are several political explanations to this statistically significant (p-value =

0.34) outperformance and they reflect the USSR’s emphasis on Olympic success.

This “manufacturing of medals” as discussed by Bernard and Busse is most likely

due to a centrally planned allocation of resources towards athletic development

that, as we see from the regression, contributes to Olympic success (Bernard et al.

415). By including the USSR dummy variable, we see a marked increase in goodness

of fit that bumps our R-squared from .39 in Model 5 to .50 in Model 6. This

demonstrates the necessity to control for political factors in determining Olympic

success.

The implications of this short discussion about the Soviet dummy variable

reflect the interest I have in finding other new independent variables. The Soviet

dummy variable really only scratches the surface on the role that politics play in

Olympic success. While it was not really possible to fully explore this topic given the

scope of this paper and the limited resources, I believe that eventually an

independent variable that reflects the size of a country’s sports market would be

prudent to include. Brad Humphreys estimated that the size of the US sports market

was between $44 and $73 billion in 2005. The sheer amount of money spent on US

sports reflects our national interest in athletic prowess and certainly helps explain

our outperformance noted in our regression models. As the buzzword “Big Data”

23

permeates business and news, I believe we can expect to start seeing more concrete

figures on the amount of money that countries actually pour into sports. With this,

we certainly will be able to expand our research and hopefully get an idea if a nation

can “spend” its way to Olympic glory.

24

Appendix

Table 1: Winners Only

Variable Model 1 Model 2

Semilog

Model 3 Model 4

Semilog

Model 5

lnpop

Intercept .0087*

(.00295)

.017

(.0153)

.0045

(.0027)

-.144*

(.0232)

.097*

(.017)

GDP per Cap

$1000s

.00014

(.000095)

.0031

(.0016)

.00019*

(.00008)

.0049*

(.0013)

.00021*

(.00008)

Population

in 10

Millions

.00047*

(.00009)

.0067*

(.0010)

.0063*

(.0010)

R-Squared .02 .04 .24 .36 .32

Observations 85 85 85 85 85

Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis.

Pure models: GDP per capita is in $1000s and population is in units of 10 million people.

25

Table 2: Everyone

Variable Model 1 Model 2

Semilog

Model 3 Model 4

semilog

Model 5

lnpop

Tobit

Intercept .0033*

(.00087)

.015*

(.0048)

.0013

(.00074)

.067*

(.0152)

.038*

(.0102)

.021*

(.02)

GDP per Cap

$1000s

.00009

(.00005)

.0023*

(.0006)

.00015*

(.000048)

.0033*

(.00071)

.00014*

(.0000476)

.0089*

(.0011)

Population

in 10

Millions

.000513*

(.000228)

.0028*

(.00068)

.0026*

(.00068)

.0082*

(.0009)

R-Squared .02 .07 .26 .28 .22 N/A

Observations 204 204 204 204 204 204

Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis. Pure models:

GDP per capita is in $1000s and population is in units of 10 million people.

Model Equations:

#1: Predicted Medal Share=b_0+b_1*GDP per Capita+ ε

#2: Predicted Medal Share= b_0+b_1*ln(GDP per Capita)+ ε

#3: Predicted Medal Share= b_0+b_1*GDP per Capita+b_2*Population+ ε

#4: Predicted Medal Share= b_0+b_1*ln(GDP per Capita)+b_2*(Population)+ ε

#5: Predicted Medal Share= b_0+b_1*GDP per Capita+b_2*ln(Population)+ ε

26

Table 3: Actual versus Predicted Medal Share

Country Actual Linear Tobit US 10.8% 2.4% 4.3%

China 9.1% 2.1% 3.6%

Russia 8.5% 1.7% 2.5%

Canada 1.8% 1.8% 2.5%

Australia 3.6% 1.7% 2.3%

UK 6.7% 1.8% 2.8%

Haiti 0% .04% -2.2%

Summary

Variable Obs Mean Std. Dev. medshare 204 .0049 .013 gdppercap 204 15,560 24,258 pop2010 204 3.38e+07 1.31e+08 loggdppercap 204 8.61 1.55 logpop 204 15.33 2.31

Figure 1

-.06

-.04

-.02

0

.02

.04

Fitte

d v

alu

es

10 15 20logpop

Fitted values Fitted values

27

Table 4: Sochi Regressions

Variable Model 1 Model

2

Semilog

Model 3 Model 4

All

semilog

Model 5

lnpop

Model 6

lnpop

Intercept .001

(.014)

.244*

(.111)

.0087

(.015)

.407*

(.128)

.118*

(.0585)

.164

(.0577)

GDP per Cap

$1000s

.00129*

(.00042)

.0275*

(.0109)

.00145*

(.00043)

.031*

(.010)

.00135*

(.0003)

.00204*

(.00047)

Population

in 10

Millions

.000283

(.000214)

.0075*

(.0034)

.0067*

(.0033)

.0078*

(.0030)

USSR

dummy

.0348*

(.015)

R-Squared .27 .21 .33 .34 .39 .50

Observations 26 26 26 26 26 26

Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis. Pure

models: GDP per capita is in $1000s and population is in units of 10 million people.

28

Bibliography

Barreto, Humberto, and Frank Howland. Introductory Econometrics: Using Monte

Carlo Simulation with Microsoft Excel. New York: Cambridge UP, 2006. Print

Bernard, Andrew B. and Meghan R. Busse. “Who Wins the Olympic Games:

Economic Resources and Medal Totals. The Review of Economics and

Statistics. Vol. 86, No. 1 (Feb., 2004). pp. 413-417. Print.

Davidson, Russell, and James G. MacKinnon. Econometric Theory and Methods. New

York: Oxford, 2004. Print.

Humphreys, Brad. “The Size and Scope of the Sports Industry in the United States.”

IASE Conference Papers. Vol. 833. International Association of Sports

Economists, 2008.

Krishna, Anirudh and Eric Haglund. “Why Do Some Countries Win More Olympic

Medals? Lessons for Social Mobility and Poverty Reduction.” Economic and

Political Weekly. Vol. 43, No. 28. (Jul. 12-18, 2008). pp. 143-151. Print.

Morton, R. Hugh. “Who Won the Sydney 2000 Olympics? Lessons For Social Mobilit

and Poverty Reduction.” Journal of the Royal Statistical Society. Series D (The

Statistician). Vol. 51, No. 2 (2002), pp. 147-155. Print.

Special thank you to Dr. Howland and Dr. Byun for all of their help on this project!

An Econometric Explanation of National Olympic Success · 2018. 5. 15. · sense that the largest...

Documents

Transcript of An Econometric Explanation of National Olympic Success · 2018. 5. 15. · sense that the largest...