An Econometric Explanation of National Olympic Success · 2018. 5. 15. · sense that the largest...
Transcript of An Econometric Explanation of National Olympic Success · 2018. 5. 15. · sense that the largest...
An Econometric Explanation of National Olympic Success London 2012
Ben Shank
ABSTRACT
This paper will explore the economic factors that impact the relative success that a nation has in the Olympic games. Given the time frame of the writing of this paper, I was particularly intrigued by the apparent domination of large, relatively wealthy nations in the 2014 Sochi Winter Games. Even from passive observation of medal counts, one can see that the nations that perform well at the Olympics are the same countries that control the world in terms of size, money and influence. This causes me to look at the variables of GDP and population.
2
Table of Contents
Introduction ............................................................................................................... 3
Literature Review ....................................................................................................... 4
Theoretical Analysis .................................................................................................... 8
Data Analysis ............................................................................................................ 11
Conclusion ................................................................................................................ 20
Appendix .................................................................................................................. 24
Bibliography ............................................................................................................. 28
3
Introduction
I had hoped to look at Sochi since it occurred at the time I was choosing a
topic; however, I quickly learned that the winter and summer games are very
different. The major difference between Sochi and London is the sheer quantity of
medals available at the summer games versus the winter games. Whereas we have
26 medal winning nations from Sochi, for the London data set, we have 85 nations
represented. There were 962 total medals awarded in London, while only 295 were
awarded in Sochi. This gives us a much more workable data set since we have a
larger sample size. With the level of observations, we are much more comfortable
interpreting the t-statistics like we would z-statistics because with a high “n” we see
that our empirical distributions will approach the normal curve distribution. This is
the first benefit of being able to look at the London data.
The second benefit of looking at a summer games is the increased level of
diversity in athletic competitions. While the Dutch were able to take a substantial
medal share due to their sheer dominance in a relatively narrow event group (speed
skating), we do not expect to see this same level of specialization in summer games.
Furthermore, the small Alpine nations in Europe have a huge advantage in skiing
events, which account for a large share of the awardable medals in Winter games,
and this leads to a medal concentration for these nations. In this study, with our
data limitations, we struggle to account for the geographic advantages afforded to
the countries that traditionally excel in winter games. The winter games certainly
have more of an emphasis on physical geography in regards to the types of athletics
we see. A simple comparison of the concentration of high-level skiers in Colorado
4
versus the level of high-level skiers in Alabama seems ample to explain what we
mean by the physical geography advantage. The bottom line is that we simply will
not find as many good skiers in a flat southern state such as Alabama as we will in a
mountainous state like Colorado. (Of course here we are ignoring the huge
demographic differences between the two states; Alabama is extremely poor
compared to Colorado. Luckily, we account for this in our Olympic regressions!)
Literature Review
This review will narrow in focus as we move from the broadest indicators for
Olympic success to the more peculiar and outlier type factors. Obviously, the “Big
Two” indicators are GDP and population. The following studies provide an outline
of the best way to compile data. The goal is to update this data with results from the
recent games in 2012.
Bernard and Busse examine the relationship between both economic factors
and population size in determining Olympic success. Their model attempts to
account for population by framing Olympic success as the percentage of medals that
a country wins at the games. They examine the medal totals of the Games from
1960-1996, as well as provide their forecast for the 2000 Sydney Games. It would
be very intriguing to look at how their predictions fared now that we can
retroactively look at the data. They postulate, “at the margin, population and high
per capita GDP are needed to generate high medal counts” (Bernard 413). Another
element that Bernard and Busse examine is the role that political attention has on
Olympic success. For example, why have former USSR outperformed their predicted
model results based on population and GDP per capita? The answer lies in the
5
political landscape of these nations; as we saw with Putin and the 2014 Sochi
Games, the Eastern Bloc nations apparently place a heavy emphasis on Olympic
success and allocate resources accordingly. While Bernard and Busse do not
attempt to quantify this allocation, they do use an error term to absorb this source
of variability (Bernard 415). They do, however, attempt to explain the Olympic
investment as a function of lagged medal count. This means that past success
implies future success and that this can be traced to investment. They argue that
this addition improves the fit of the model (Bernard 415).
Their model is essentially a function of population size, GDP per capita and
an error term (Bernard 415). I think it would be interesting to attempt to come up
with a variable that quantifies the size of the sports market in countries. This is very
difficult in lower income countries, but it could be an interesting point of
comparison between some of the Olympic powerhouse nations like the US, Russia,
Germany, and China. “Developing Olympic caliber athletes requires considerable
expenditure on facilities and personnel” (Bernard 414). An interesting part of the
Bernard and Busse study is their use of the log function transformation on
population and GDP per capita. They also find that the host nation typically
outperforms the estimated results from the model by about 1.8%. Perhaps the most
intriguing finding of the study is that total GDP is the crucial determinant in
predicting medal counts. “This suggests that it is a country’s total GDP that Matters
in producing Olympic athletes. This in turn has the implication that two countries
with the same GDP will win approximately the same number of medals, even if one
is more populous with lower per capita income and the other is smaller with higher
6
per capita GDP. Furthermore, there is strong evidence for durability to a country’s
Olympic investments” (Bernard 417).
Morton examines the 2000 Sydney games in a similar fashion to Bernard and
Busse. However, instead of focusing on overall medal counts, he attempts to weight
medals so as to determine a “winner” (Morton 147). He builds a model of GDP in
millions of USD and populations in millions similar to the model of Bernard and
Busse. The next goal was to measure the residuals as a measure of outperformance
based on the countries’ respective economic and population data. He postulates the
Cuba is the ultimate positive outlier and that India vastly underperformed. We once
again see use of the natural log on population and GDP to improve the goodness of
fit of the data (Morton 148). We include this study given its focus on over
performers. The next study is helpful in explaining why one of these countries
performed the way it did.
While Bernard and Busse present us with the kind of comprehensive model
we want to emulate, Krishna and Haglund present a case study of India’s Olympic
success (or lack thereof) in comparison to other nations. They question why India’s
one-sixth share of world population and rapidly growing economy has not
translated into even mediocrity at the Olympic Games. India, in comparison to even
the poorest countries, had the lowest medal per citizen ratio in the 2004 Sydney
Games (Krishna 143). Nigeria, Cuba, and Thailand outperformed India. Given
India’s total GDP (and modestly respectable growing GDP per capita), why have they
performed so poorly at the Games? “Why do 10 million Indians win less than one-
hundredth of one Olympic medal, while 10 million Uzbeks won 4.7 Olympic
7
medals?” (Krishna 143). The clear point of this paper (it references Bernard and
Busse) is to figure out why India is an outlier in terms of underperformance at the
Games based on its GDP and population.
The explanation for this underperformance comes from what Krishna and
Haglund refer to as the effectively participating population rather than the total
population. The premise of this concept is based on the idea that Olympic athletes
are not coming from the portion of the population that is uneducated, malnourished
and socially discriminated against (Krishna 144). Thus, the belief is that the greater
the proportion of the population that has access to these resources, the greater the
amount of Olympic caliber athletes and therefore Olympic medals for a given nation.
The following passage reflects this logic, “That one billion Indians together won only
one Olympic medal seems otherwise hard to explain. Any explanation based on race
or genetic characteristics seems facile simply on account of the immense diversity
found in India. But if a vast majority of Indians are not effective participants,
possibly because information about these events is available to a tiny number - and
a tinier number yet know where and how to avail themselves of these opportunities
- then a more complete explanation for poor performance comes to hand” (Krishna
144).
What the Krishna and Haglund adds to our research is a level of economic
characterization beyond that of just personal wealth as indicated by GDP per capita.
They include indicators that reflect the standard of health, education and a measure
they call “connectedness”. Connectedness is meant to reflect the access to
information and thus resources necessary to form the effectively participating
8
population. The variables for connectedness are “road length per unit of land area,
share of urban population, and radios per capita” (Krishna 144). I believe that it is
crucial to include findings from this study in research due to its attention to one of
the most peculiar outliers. In my study, I am limited to a GDP and population model
although I feel that the model fails to tell the entire story and that much can be
learned from those nations that significantly outperform or underperform based on
their respective “Big Two”. (The Big Two factors are GDP
per capita and population.)
Theoretical Analysis
The key determinants in Olympic success are GDP per capita and population.
If we measure Olympic success by the total medal count, then intuitively it makes
sense that the largest nations will accumulate the most medals. The more athletes
that a given country sends to the games, the more chances they have to earn medals.
Also, with a larger population, a nation is able to pull from a more diverse group of
athletes and has not only a larger talent pool, but also has the ability to specialize
across disciplines given the large number of athletes. The US is a prime example of
this phenomena. Americans have dominated the games for years in all sorts of
events, from swimming to shooting to speed skating. This comes from our large
population and the diversity of sports focus across the nation.
It helps to think of the large population advantage like we would high school
sports. As a graduate of a tiny private school in Fort Wayne, Indiana, I definitely got
a different view of high school sports than your typical Midwestern teenager. My
school lacked the sheer size to field a football time. With only 87 students in my
9
graduating class, we simply did not have the number of students necessary to
develop a football program. Lacking the “sports infrastructure” simply kept us off of
the playing field and led our athletically inclined students to pursue other interests.
This same comparison can be made at a country level. Just as Canterbury High
School struggled to find the manpower necessary to field a human capital heavy
sport like football, small countries too will struggle to find the human resources to
develop enough athletes to take a substantial portion of the total medal count.
While Canterbury lacked a football team altogether, the size of the school and
focused efforts did create an environment conducive to specialization in specific
sports. Think of the small mountainous European nations that dominate skiing
events in the Olympics. Think of the Dutch in speed skating. Canterbury used the
same methods; while we couldn’t compete in the sports requiring tons of people ie
track, football, wrestling, Canterbury excelled in tennis, golf, and soccer. These
sports require substantial skill development and given the location, size and
demographic factors, Canterbury was able to defeat much larger schools in these
arenas.
In addition to not competing, Canterbury also lacked the ability to keep up
with large schools in terms of talent. When playing a large high school of 4,000
students, there was a very evident talent gap between Canterbury and a school such
as Carmel. The sheer size of the schools allowed them to develop and pull talented
athletes from a much larger population. However, unlike high school sports, the
Olympic games are not stratified by population size. Canterbury obviously
struggled to keep pace with the behemoth high schools in the same fashion that a
10
small nation will struggle to pump out Olympic medals as compared to giants like
Russia, the US and China. The allegorical use of high school sports helps to explain
the theoretical basis for deeming population size as one of our key indicators in
determining Olympic medal count success.
GDP per capita is the other main determinant for medal count success at the
Olympics. Athletic development costs money first and foremost. It costs a lot of
money to build stadiums, purchase equipment, and pay for coaching to name a few
explicit costs. Aside from these explicit costs as determinants for athletic success,
we also need to think about the countries at the margin. Imagine the typical family
in a poor country. This family’s primary concerns, rather than positioning children
for athletic success will be primarily concerned with allocating resources (time and
money) towards procuring food for the children. Getting their son or daughter on
the best travel soccer team or getting advice from the best (which typically implies
expensive) figure skating coach in the area is the least of a poor parent’s worries.
Simply, the lack of time and money to employ toward the development of athletics is
a primary factor for understanding why GDP per capita is a key determinant.
In addition to the simple allocation of funds towards sports, we can also use
GDP per capita as a representative of total health in the country. By almost all
measures, and 500 other papers could be written on this topic alone, wealthier
nations (on a per capita basis) are simply healthier nations. A nation cannot hope to
develop Olympic athletes if it is plagued by malnutrition, infant mortality, and
infectious disease. The above factors will simply wipe out the available athletes.
Malnutrition especially will prevent millions of young people from developing
11
physically at the rate and efficiency that the average American or Western European
will develop. Although India has over 1.2 billion people, the fact that it has one-third
of the world’s malnourished children gives a glaring explanation for its lack of
Olympic success (Krishna 145). There are worse circumstances of malnutrition
than a lack of Olympic success but this is an example of another reason for the
theoretical inclusion of GDP per capita as an indicator.
Data Analysis
This paper employs Ordinary Least Squares methods in estimating
coefficients on the key independent variables, which determine the dependent
variable, total medal count. In this analysis, I will explore the results of the 2012
London Summer Games. This type of analysis has been done several times before
on past Olympic games, but I believe there is much to be gained to by investigating
the determining factors from the most recent summer games.
The variables I use are percentage of medal share, which is simply a
country’s total medals won, divided by the overall medals that were awarded at the
games. Medal share is our dependent variable and we will then look at the impact
that our independent variables have on a country’s medal share in the games. The
data set originally contained economic data on 204 countries, which is effectively
every nation in the world. We run two data sets to look at the fit. In one regression,
we take the 204 countries, all of which had at least one representative at London.
However, given that some countries really aren’t major players in the sense that
they sent maybe one athlete, we attempt to account for this discrepancy by
dropping countries that failed to win a medal. Dropping all these countries is no
12
doubt an issue for our data analysis, but it proves useful so that we look at the
“athletically relevant” countries that competed for the purposes of this analysis.
I began trying to use total medal count as the primary dependent variable;
however, after running several OLS regressions with total medal count versus medal
share percentage, I found that the R-squared values, a representative statistic of the
goodness of fit of a line demonstrated
For the London 2012 Summer Games, we have the total medal counts, the
breakdown of medal counts, population and GDP data. (The model equations can be
seen in the appendix and in Table 1). In our regressions, we include a random
variable in order to account for chance error since we are dealing with random
variables. The error term (represented by epsilon in the Model Equations portion of
the Appendix) also absorbs the effect of omitted variable bias. When attempting to
predict Olympic success, it is important to remember that there are more variables
at play than simply wealth and size of the nation. Examples of this include a nation’s
geography, climate, political situation, etc. Any number of combinations of these
factors influences the relative Olympic success of a particular nation. Some of the
trends seen in the Bernard and Busse study involve the political affiliations of
traditionally successful nations like Russia and China. They look through time at
these countries’ success and use a dummy variable to acknowledge the apparent
influence of communism on Olympic success. The coefficients on the dummy
variable were highly statistically significant. This provides one example of why we
use the dummy variable to absorb the influence of these values we fail to explicitly
measure in our regressions.
13
When running OLS regression, we are assuming that the Classical
Econometric Model has not been violated. One of these conditions is that we must
have a homoscedastic distribution of errors. This means that the distribution of the
errors do not vary at different values of our independent variables. However, after
seeing from a Breusch-Pagan test that there is evidence of heteroskedasticity (as
shown by p-values near 0), I made the choice to report robust SEs since they
account for heteroskedasticity.
Winners Only
First we look at the GDP per capita is a primary dependent variable after
dropping all nations that failed to bring home a medal. In Model 1, we run a pure
regression with this variable on medal share percentage. We put this in units of
$1000s for readability’s sake and due to the fact that an increase of $1 in average
GDP per capita in a nation probably will not “move the lever” on Olympic success.
This first regression gives us an estimated slope coefficient of .00014. This implies
that as GDP per capita increase by $1,000, there will be an expected increase in
medal share count at the Sochi Winter games of about .014% give or take .0095%.
This value is not statistically significant with a t-value of p-value of 14%.
In Model 2, we take the natural log of GDP per Capita. The coefficient
therefore implies that a 1% increase in GDP per capita is associated with a 0.031%
increase in medal share count. This value is not quite statistically significant with a
t-statistic of 1.93, which corresponds to a p-value that is effectively 5.7%. Given the
drop in R-squared value from .026 in Model 1 to .043 in Model 2, we have reason to
believe that the linear data fits better than the natural log transformed data.
14
Model 3 brings population size into the mix. As we discussed above,
population is a definite indicator of how many medals a country will win in a given
year. Model 3 is a pure regression of medal share on population and GDP per capita,
which means that we make no transformations on the data. We interpret the GDP
per capita coefficient of .00019 the same way that we do in Model 1. We interpret
the population coefficient as a 10 million person increase in population will be
associated with a .00048% increase in medal share percentage. Model 3 is definitely
an improvement over Model 2 in that it accounts for the sheer number of athletes in
a country and this yields an R-squared value of .25. This model demonstrates the
effects of omitted variable bias, which in this case stem from our neglect of
population as a determinant in predicting medals won in the earlier models. This
can be seen by the decrease in the estimated slope coefficient on GDP per capita,
which falls since we do not attribute every movement in medal count to GDP per
capita.
Model 4 employs semilog model techniques in that we take the natural log of
our two independent variables: population and GDP per capita. This results in us
interpreting the population coefficient as a 10% increase in population size is
associated with .067% in medal share for a given nation. A 10% increase in GDP per
capita yields a .049% in the share of medals won at the 2014 Sochi games. This
regression yields a much better R-squared value to that which we find in Model 3
with .37.
I initially believed that Model 5 would serve as an improvement over Model
4. I thought that undoing the logarithmic transformation on GDP per Capita, but
15
leaving it on population would improve the fit of our data. However, when we run
the regression, we actually find that the R-squared value falls to .32. This provides
further evidence that the logarithmic transformation to GDP per Capita is in fact
appropriate on the basis of fit.
All Participants
It would be mind-numbing to repeat the interpretations of each coefficient in
each model since all that we did was include the 119 (204-85) non medal winners in
this second set of regressions found in Table 2. So in sparing both the readers and
myself, we will discuss the implications of this second set. When we run the
regressions, we quickly realize that across the board, our estimated coefficients
have fallen. This can be seen in a side-by-side comparison of Table 1 and Table 2;
within the first glance, we see that the values are smaller. In this set of regressions,
we also find that our R-squared values have fallen. We attribute this to the greatly
increased number of observations from 85 to 206. As we add back in a bunch of non
medal winners with actual medal share equal to 0%, we recognize that our fitted
OLS regression will not fit as well. Even with some marginal income and simply a
population, most countries will break into the realm of which we expect them to
garner at least some portion of the medals. What this means is that we predict that
many countries will win some, however small, portion of the medals available. So
every time one of these non-winners posts a 0 in medal share, we poorly estimate
the medal share count, which is reflected in the noticeably smaller R-squared values
across the board.
16
By including the non-winners, we quickly realize the implications of our
lower bound value of 0. While this leads to headaches in our plain vanilla OLS
regressions, we do have a valuable resource to alleviate this headache (thank you to
Dr. Howland). This resource is tobit analysis; tobit analysis is used in the cases in
which the data set has a definite upper and lower bound. Since medal share is a
percentage, this implies that there is a lower bound of 0% and an upper bound of
100%. While we don’t really worry about the upper bound, I do not think that even
the most spirited USA! chants will push us to the point of winning every single
medal at the Olympics (100% medal share), we are very aware of the lower bound
value of 0%. We are so aware of this value that we dropped 121 observations for
one of our sets of regressions.
Since we already have somewhat of an idea that Model 4, with its semilog
functional form, is the best fitting regression, we will conduct tobit analysis on
Model 4 alone. The tobit analysis allows us to account for this lower bound value
instead of altogether dropping these values like we do in the Winner’s Only analysis.
The results are shown in the Tobit column of Table 2. We see from the results that
our slope coefficients on both GDP per Capita and population are significantly
greater than in the other two sets. Since both variables have a logarithmic
transformation we interpret the coefficients just as we would in a semilog model.
Thus, we associate a 10% increase in GDP per Capita with a .89% increase in medal
share count. Given that there are 962 available medals, we associate this 10% GDP
per Capita increase with an increase in total medal count of about 8.5 medals. So if a
nation wants to increase its medal count by 8.5 medals, they should plan to increase
17
the general wealth of the nation by about 10%. Similarly for population, we
associate a 10% increase in population with about an increase in medal share count
of about .82%. This implies that increasing a nations population by 10% will cause
us to predict them winning about 7.9 more medals in the 2012 Olympics. The key
difference we see with the tobit model is in magnitude of these slope coefficients.
This discrepancy is especially evident when we view Graph 1 that was
generated in Stata. On the y-axis, we measure the predicted medal share of the
nation. On the x-axis, we have the natural log of the population. The red marks
represent the values predicted by the tobit model and the blue marks represent the
linear predictions generated by OLS regression. Several things jump out at us by
viewing the graph. First, we witness the lower boundedness evident in OLS
regression by noticing that the blue marks do not really fall below 0 (except in some
extreme cases when the negative constant value overwhelms the independent
variable values of GDP per Capita and population). Second, we notice that GDP per
Capita does not explicitly appear on the graph. While it is not measured on an axis,
we interpret the relative location of the marks in terms of vertical distance as
measures of GDP per Capita. Perhaps the best way to demonstrate this is to look at
China and India. In both the tobit and OLS models, China and India are represented
by the marks on the far right of the graph due to their massive populations of over 1
billion people. However, even though we see that the they are roughly the same
size, China’s mark is substantially higher on the y-axis due to the models’
predictions that China will win a greater share of the medal count. This difference in
vertical difference is effectively the difference in GDP per Capita as implied in the
18
models. Thus, although China and India are about the same size, our models predict
that China will win more medals than India due to China’s higher GDP per Capita.
Finally, we see that the red tobit values have a much steeper slope than the
flatter linear regression values. This comes from the fact that we have accounted for
all the non medal winners by effectively weighting them in a manner that says, if
mathematically possible, these countries’ economic indicators are so poor for
Olympic success that they would actually have a negative medal share count.
Clearly, a country cannot win a negative share of medals, but if there were some way
to index Olympic success based on these predictions, we could measure just how
poorly these lower end countries performed. Essentially, the 0% medal share is like
a stop loss in that the y realy would be worse if we had a better indicator of their
athletic prowess. Of course, until we start living in a more perfect world, this is not
really possible given the limitations of math and statistics. However, once we reach
this utopia, econometrics will not be taught because empirical research will
perfectly match theoretical predictions.
In conclusion, we find that the tobit semilog of GDP per Capita and
population is the best model. As we discussed above, the tobit model allows to
account for the lower boundedness of a medal share better than dropping all of the
non winners. Simply taking all of the nations represented gives us too weak of
values in the magnitude of slope coefficients that are roughly one-third of those in
the tobit model and almost one-half of those in the dropped losers model. These
stronger values give us more confidence in predicting values.
19
Table 3 provides us with a list of some of the interesting countries and their
actual medal share, linear predicted medal share and tobit medal share. The table
provides us with a more concrete idea of how well these predictors do. I used
several outliers in the table, which makes both models look worse than they actually
are. Clearly, we see that in the case of countries that outperformed the models (US,
China, Russia), that the tobit model was much closer in predicting the actual medal
share than the simple linear regression. Also, in looking at the table we see an
example of a negative tobit prediction value. Haiti, with all of its issues, is predicted
by the tobit model to actually earn a negative share of the medals. While this is of
course not possible in practice, it illustrates the point we made earlier about
boundedness and can also be seen in Graph 1. I include Canada simply because it
performs almost exactly as predicted by linear model and slightly underperforms
the tobit model. One interesting variable that I did not account for, but that past
studies have done is to include a dummy variable for the host nation. The Bernard
and Busse study looks at the games over time and includes a dummy variable for
host nations. They conclude that being the host nation gives a substantial bump and
that nations are expected to take roughly 4% more of the medal share had they not
been the host. This factor could help to explain the outperformance of the UK since
they hosted the games and had roughly 4% more of the medal share than predicted
by the tobit model.
Previous Work and Possible Errors
My data appears to be roughly consistent with prior economists’ work. As
expected, we find that GDP per Capita and population are highly relevant factors in
20
predicting Olympic success. This has been demonstrated through our multiple
regressions using both OLS and tobit regressions as well as natural logarithmic
transformations. I effectively replicated the Bernard and Busse study in that I used
tobit regression techniques on the natural logarithmic transformations of GDP per
Capita and Population in order to estimate medal share for the 2012 London
Olympics. With their more robust resources, they include several other variables
that I wish I could have employed in my data set. Originally, I researched the 2014
Winter Games and noticed that the small alpine nations and the former Soviet Union
nations seriously outperformed the predictions in models. These outliers are
certainly noteworthy and demonstrate the likelihood of exclusion of important
variables that I have mentioned throughout the paper up to this point such as
politics, geography, climate, etc. They also looked at the Olympics through time
whereas I only look at London. Since I only looked at one Summer Games, there is
some worry that my data is not representative of Olympic success as a whole.
Conclusion
The obvious takeaway from this paper is that the wealth and size of a nation
drives its Olympic success. We see this in our models, as the slope coefficients on
GDP per Capita and population are routinely statistically significant. We expected
this result and it seemed elementary to propose a null hypothesis that stated that
these slope coefficients would be equal to 0. If we had done this, we would have of
course rejected the null based on the results of the F-tests of our regressions. The p-
values associated with these F-tests (which test the probability that the slope
coefficients are equal to zero based on chance alone) were very small, which gave us
21
reason to reject the null hypothesis that GDP per capita and population did not play
a role in determining medal share.
Given that our independent variables are highly influential, the next
important goal was finding the best model to estimate the actual results. Using
natural logarithmic transformations to better fit the data was crucial in finding the
best model. While I expected the natural logarithm function to work well on the
population variable, I was surprised to find that the transformation also improved
the fit of GDP per Capita. This implies that the relationship between medal share
and the two independent variables is logarithmic in nature as shown by the
improvement in R-squared values following the transformation.
The crucial development in my analysis was to employ tobit regression
rather than the “standard” OLS regression methods we used in class. The tobit
model provided stronger slope coefficients on the independent variables, which in
turn allowed as to better predict the actual medal shares. Whereas we dealt with
the lower bound value of zero in the linear model by dropping 119 non medal
winners, tobit analysis not only included these values, but used them to better
predict actual results as we see in Table 3.
Future Work
I ran several regressions on the Sochi 2014 data set, which are shown in
Table 4. The dummy variable assigns a 1 to countries that were formerly a part of
the Soviet Union and a 0 to the other medal winners. We do this because former
Eastern Bloc nations have routinely outperformed at the winter games based on
their population size and relatively low GDP per capita. The coefficients on GDP per
22
capita and population and interpreted the same way we have interpreted them in
previous models. We interpret the dummy variable coefficient, which we refer to as
USSR, as: we expect former Soviet countries (indicated by USSR=1) to have a 3.48%
greater share of winter Olympic medals than a non-Soviet country ceteris paribus.
There are several political explanations to this statistically significant (p-value =
0.34) outperformance and they reflect the USSR’s emphasis on Olympic success.
This “manufacturing of medals” as discussed by Bernard and Busse is most likely
due to a centrally planned allocation of resources towards athletic development
that, as we see from the regression, contributes to Olympic success (Bernard et al.
415). By including the USSR dummy variable, we see a marked increase in goodness
of fit that bumps our R-squared from .39 in Model 5 to .50 in Model 6. This
demonstrates the necessity to control for political factors in determining Olympic
success.
The implications of this short discussion about the Soviet dummy variable
reflect the interest I have in finding other new independent variables. The Soviet
dummy variable really only scratches the surface on the role that politics play in
Olympic success. While it was not really possible to fully explore this topic given the
scope of this paper and the limited resources, I believe that eventually an
independent variable that reflects the size of a country’s sports market would be
prudent to include. Brad Humphreys estimated that the size of the US sports market
was between $44 and $73 billion in 2005. The sheer amount of money spent on US
sports reflects our national interest in athletic prowess and certainly helps explain
our outperformance noted in our regression models. As the buzzword “Big Data”
23
permeates business and news, I believe we can expect to start seeing more concrete
figures on the amount of money that countries actually pour into sports. With this,
we certainly will be able to expand our research and hopefully get an idea if a nation
can “spend” its way to Olympic glory.
24
Appendix
Table 1: Winners Only
Variable Model 1 Model 2
Semilog
Model 3 Model 4
Semilog
Model 5
lnpop
Intercept .0087*
(.00295)
.017
(.0153)
.0045
(.0027)
-.144*
(.0232)
.097*
(.017)
GDP per Cap
$1000s
.00014
(.000095)
.0031
(.0016)
.00019*
(.00008)
.0049*
(.0013)
.00021*
(.00008)
Population
in 10
Millions
.00047*
(.00009)
.0067*
(.0010)
.0063*
(.0010)
R-Squared .02 .04 .24 .36 .32
Observations 85 85 85 85 85
Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis.
Pure models: GDP per capita is in $1000s and population is in units of 10 million people.
25
Table 2: Everyone
Variable Model 1 Model 2
Semilog
Model 3 Model 4
semilog
Model 5
lnpop
Tobit
Intercept .0033*
(.00087)
.015*
(.0048)
.0013
(.00074)
.067*
(.0152)
.038*
(.0102)
.021*
(.02)
GDP per Cap
$1000s
.00009
(.00005)
.0023*
(.0006)
.00015*
(.000048)
.0033*
(.00071)
.00014*
(.0000476)
.0089*
(.0011)
Population
in 10
Millions
.000513*
(.000228)
.0028*
(.00068)
.0026*
(.00068)
.0082*
(.0009)
R-Squared .02 .07 .26 .28 .22 N/A
Observations 204 204 204 204 204 204
Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis. Pure models:
GDP per capita is in $1000s and population is in units of 10 million people.
Model Equations:
#1: Predicted Medal Share=b_0+b_1*GDP per Capita+ ε
#2: Predicted Medal Share= b_0+b_1*ln(GDP per Capita)+ ε
#3: Predicted Medal Share= b_0+b_1*GDP per Capita+b_2*Population+ ε
#4: Predicted Medal Share= b_0+b_1*ln(GDP per Capita)+b_2*(Population)+ ε
#5: Predicted Medal Share= b_0+b_1*GDP per Capita+b_2*ln(Population)+ ε
26
Table 3: Actual versus Predicted Medal Share
Country Actual Linear Tobit US 10.8% 2.4% 4.3%
China 9.1% 2.1% 3.6%
Russia 8.5% 1.7% 2.5%
Canada 1.8% 1.8% 2.5%
Australia 3.6% 1.7% 2.3%
UK 6.7% 1.8% 2.8%
Haiti 0% .04% -2.2%
Summary
Variable Obs Mean Std. Dev. medshare 204 .0049 .013 gdppercap 204 15,560 24,258 pop2010 204 3.38e+07 1.31e+08 loggdppercap 204 8.61 1.55 logpop 204 15.33 2.31
Figure 1
-.06
-.04
-.02
0
.02
.04
Fitte
d v
alu
es
10 15 20logpop
Fitted values Fitted values
27
Table 4: Sochi Regressions
Variable Model 1 Model
2
Semilog
Model 3 Model 4
All
semilog
Model 5
lnpop
Model 6
lnpop
Intercept .001
(.014)
.244*
(.111)
.0087
(.015)
.407*
(.128)
.118*
(.0585)
.164
(.0577)
GDP per Cap
$1000s
.00129*
(.00042)
.0275*
(.0109)
.00145*
(.00043)
.031*
(.010)
.00135*
(.0003)
.00204*
(.00047)
Population
in 10
Millions
.000283
(.000214)
.0075*
(.0034)
.0067*
(.0033)
.0078*
(.0030)
USSR
dummy
.0348*
(.015)
R-Squared .27 .21 .33 .34 .39 .50
Observations 26 26 26 26 26 26
Notes: Red indicates negative values. p-value<.05 indicated by *. SEs reported in parenthesis. Pure
models: GDP per capita is in $1000s and population is in units of 10 million people.
28
Bibliography
Barreto, Humberto, and Frank Howland. Introductory Econometrics: Using Monte
Carlo Simulation with Microsoft Excel. New York: Cambridge UP, 2006. Print
Bernard, Andrew B. and Meghan R. Busse. “Who Wins the Olympic Games:
Economic Resources and Medal Totals. The Review of Economics and
Statistics. Vol. 86, No. 1 (Feb., 2004). pp. 413-417. Print.
Davidson, Russell, and James G. MacKinnon. Econometric Theory and Methods. New
York: Oxford, 2004. Print.
Humphreys, Brad. “The Size and Scope of the Sports Industry in the United States.”
IASE Conference Papers. Vol. 833. International Association of Sports
Economists, 2008.
Krishna, Anirudh and Eric Haglund. “Why Do Some Countries Win More Olympic
Medals? Lessons for Social Mobility and Poverty Reduction.” Economic and
Political Weekly. Vol. 43, No. 28. (Jul. 12-18, 2008). pp. 143-151. Print.
Morton, R. Hugh. “Who Won the Sydney 2000 Olympics? Lessons For Social Mobilit
and Poverty Reduction.” Journal of the Royal Statistical Society. Series D (The
Statistician). Vol. 51, No. 2 (2002), pp. 147-155. Print.
Special thank you to Dr. Howland and Dr. Byun for all of their help on this project!