Abnormal Returns in Gold and Silver Exchange Traded...

Abnormal Returns in Gold and Silver

Exchange Traded Funds

Michael J. Naylor*, Udomsak Wongchoti and Chris Gianotti

School of Economics & Finance

College of Business

Massey University

Private Bag 11 222

Palmerston North

New Zealand.

* Corresponding author; [email protected]. T: +64-6-3509099 x7720

___________________________________________________

Acknowledgements

We wish to thank departmental colleagues for comments.

Version 1.0: 8th Feb 2011

mailto:[email protected]

~ 2 ~

Abnormal Returns in Gold and Silver Exchange Traded Funds

____________________________________________________

Abstract

Exchange Traded Funds (ETFs) are one of the fastest growing areas of

investing and have significantly changed investor behavior, yet there is limited

academic research on ETFs, with minimal on commodity based ETFs. This paper

is the first to examine whether abnormal returns are available for Gold and

Silver ETFS.

Gold and silver ETFs have attracted substantial investments, with $40.5 billion

USD in gold (GLD) is and $5.22 billion in silver (SLV). Our study shows that

while inefficiency is present in the gold and silver ETF markets for the period

2006-2009, when risk is factored the abnormal returns evaporate. However,

use of simple filter trading rules does allow abnormal returns. GLD and SLV

exhibit similar characteristics to the underlying physical assets and when

ignoring risk are able to provide investors with greater profit than if investing in

the market.

JEL classification: G10, G11, G12, G14

Keywords: Abnormal returns, exchange traded funds, commodities, filter trading,

~ 3 ~

1 Introduction

1.1 Overview

Exchange Traded Funds (ETFs) are one of the fastest growing areas of

financial markets and have significantly changed how investors at both an

institutional and retail level construct their investment portfolios investor

behavior. The rapid growth raises the question as to whether it is due to

superior characteristics or whether an abnormal return is present. However

there is relatively little academic research on ETFs (Anderson, Born &

Schnusenberg, 2008), and nearly no research on commodity ETFs, the fastest

growing area. Very little is known about the dynamics of this market. This paper

is the first to examine whether prices in commodity exchange traded funds in

two main funds GLD (Gold) and SLV (silver) accurately reflect all the relevant

information in the market, exhibit dependence and offer abnormal return on a

risk adjusted basis. We also examine four smaller funds.

The study is also unique in that it applies existing knowledge of physical gold

and silver prices/returns to examine whether the return properties and

characteristics for gold and silver ETFs perform in the same way as the physical

asset.

1.2 The Exchange Traded Fund Market

ETFs are similar to mutual funds, except that shares in an ETF can be bought

and sold throughout the day and ETFs do not sell or redeem their individual

shares at net asset value. Instead, financial institutions purchase and redeem

ETF shares directly from the ETF in large blocks, called "creation units". ETFs

generally provide the easy diversification, low expense ratios1, transparent

portfolios and tax efficiency as index funds, while still maintaining all the

features of ordinary stock, such as limit orders, short selling, and options. ETF

funds also do not have to maintain a cash reserve for redemptions or pay

brokerage expenses so do not charge a load.

EFT prices are set by markets and not based on day-end asset values. If there

is strong investor demand for an ETF, its share price will rise above its net

asset value per share, giving arbitrageurs an incentive to purchase additional

creation units from the ETF and sell the component ETF shares in the open

1 Annual management fees are 0.3-.6%, vs mutual funds at 1-2% (Kelleher, 2010).

~ 4 ~

market. The additional supply of ETF shares increases the ETF's market

capitalization and reduces the market price per share, generally eliminating the

premium over net asset value. ETFs are dependent on the efficacy of the

arbitrage mechanism in order for their share price to track net asset value.

This growth reflects the popularity of the ETF as an investment vehicle for both

retail and institutional investors as an easy to use way to diversify across a

broader range of securities. The value of total ETF assets under management at

January 2010 was US$1,032B, up from US$710B in December 2008 (Fuhr,

2010). At the current growth rate ETFs will rapidly eclipse that of mutual funds.

The fastest growing sector of ETFs has been commodities as the market is

active and offers diversification. The first commodity ETF was established in

2002 and by January 2010 the GLD fund had grown to US$ 40.5B and SLV fund

to US$5.22B (The largest is S&P500 fund SPY with US$78B). Both GLD and SLV

physically own the commodity.

The impact of the rapid growth of commodity gold and silver ETFs is uncertain;

the rapid growth in ETF assets can increase underlying demand, while supply is

unchanged. Alternatively commodity ETFs may be substitutes for existing

demand for the product. Brikner & Collins (2008) argue that growth in ETFs has

boosted speculation in markets and creating unstable price fluctuations. The

ETFs themselves are also speculators.

2 Literature on the Efficiency of the Physical and ETFs markets

Aggarwal & Soenen (1988) examines gold market efficiency vs the S&P 500.

They argue that the strong movement in gold prices offer an opportunity for

strong returns even risk adjusted, though there was significant positive

skewness and daily changes was leptokurtic. However the inefficiencies

diminished under the CAPM model and that when filter tests were applied the

excess return likewise evaporated.

Solt & Swanson (1981) note the importance of including gold and silver in a

risk return framework and note that gold and silver returns show a low

correlation to the market. They thus conclude that gold and silver have a

speculative rather than investment value.

Fama & Blume (1973) developed filter trading rules and examined the returns

of 24 filters concluding that the evidence indicated trading rules do not

consistently earn returns greater to those of a simple passive (buy and hold)

~ 5 ~

approach. The filter trading rules are x%; if daily closing price (of gold or silver)

increases x% , buy and hold the commodity, until it’s price moves down at least

x% from the subsequent high, at which time the investor should simultaneously

sell and go short. The short position is maintained until the daily close price

rises at least x% above the subsequent low at which time the investors buys.

De Jong & Rhee (2008) found that ETFs in general provide abnormal risk

adjusted returns based on Fama and French 3 factor model exceed transaction

costs. De Jong & Rhee do not however distinguish commodity ETFs from other

ETFs.

Buyuksahin, Haigh, and Robe (2008) examined the S&P Goldman Sachs

Commodities Index (S&P GCSI) weekly return data and revealed no correlation

between commodity and equity returns, with commodity returns being lower

and no sustained pricing relationship over the long-run. Nor was evidence found

of an increase in co-movement between equities and commodities during a

period of extreme returns.

Engle & Sarkar (2006) found that where an arbitrage opportunity exists

between the price of a given ETF and its Net Asset Value (NAV), this will be

minor and quickly exploited eliminate any premiums/discounts. Miffre (2007)

shows that higher than average returns can be achieved by investors taking

long or short positions for a given level of risk when investing in iShares MSCI

country funds ETF.

3 Methodology

3.1 Hypotheses

The objective of this study is to determine if by investing in ETFs GLD (gold)

and SLV (silver), investors can achieve abnormal or above market returns due

to inefficiencies. The study assumes gold and silver pricing are driven by

fundamental supply and demand forces in the marketplace.

Time series analysis is used and based on a daily close price/return series

derived from the NYSE Arca between 5th of May 2006 and 31st of December

2009. Within the study period the macro economy has experienced periods of

growth, severe contraction and return to low/mild growth while maintaining

extraordinarily high demand for gold and silver. It is thus an ideal period as for

~ 6 ~

ascertaining whether dependence actually exists in the series and returns are

abnormal.

On this basis our hypothesis is that;

a) Returns between gold backed ETF GLD and silver backed ETF SLV are

independent and inefficient, meaning that an abnormal return is possible if

investing in GLD/SLV.

b) GLD and SLV provide investors with abnormal returns once adjusted for risk.

Expressed as;

CAPM (1)

(RGLD - Rf)t = + β (Rmt − Rf)t + ųt

(RSLV - Rf)t = + β (Rmt − Rf)t + ųt

c) The application of filter trading rules to GLD and SLV will result in an

evaporation of the abnormal return.

3.2 Data

GLD and SLV are the primary funds focused on in the analysis. However four

additional funds on the NYSE Arca, the gold funds IAU, and DGL, and silver

funds DBS and AGQ, were randomly chosen from twenty-one precious metals

ETFs. The key criteria are the fund must trade gold or silver, be an ETF (not an

ETN), use a passive strategy, offer minimum net assets of $50 million and have

begun trading no later than December 2008. This criteria reduced the applicable

funds for research from twenty-one to six. We also included the S&P 500 as a

benchmark. The data has been sourced from the Centre for Research and

Securities Prices (CRSP).

Table 1 about here

The time horizon is from 5 May 2006 to 31 December 2009 for the funds GLD

and SLV. As the inception date of the four remaining ETFs IAU, DGL, DBS and

AGQ varies between January 2005 and December 2008 the dataset for these

funds is adjusted to the each funds applicable lifetime. Data will be based on

daily closing prices of units/shares in each of the funds.

~ 7 ~

3.3 Methodology – Hypothesis a) Abnormal Returns

To prove hypothesis a) we must show that a dependency exists in series

between prices. Conversely, if GLD and SLV were efficient, then our expectation

is that today’s price is not reflective of yesterday’s price, rather it is reflective of

today’s information and that there is no pattern or consistency so that pricing

patterns follow a random walk, or that they are normally distributed.

The method for completing the test of hypothesis a) is to perform econometric

tests across 923 observed price movements for GLD/SLV and between 261 to

925 observations for the remaining funds (IAU/AGL/DBS/AGY) when extending

the test. For consistency with previous tests, the price series for GLD and SLV

are converted to (natural) logarithms, or continuously compounded returns. The

data set for this test is unfiltered as the use of logarithms reduces the affects of

outliers.

Previous work in the study of efficiency in physical gold and silver pricing by

Solt & Swanson (1981) identifies the occurrence of non-normality and non-

stationarity in the means and variances of the price for distributions, concluding

that (despite an element on non-stationarity) there exists a positive

dependence in the series. Snedecor & Cochran (1967) likewise reject the

presence of normality for a price series of gold.

Establishing stationarity is important for ensuring the significance of the series

to predict future prices and will be tested using the autocorrelation function

(ACF) and supported by the augmented Dickey-Fuller test (ADF). The series is

expected to exhibit the following characteristics if stationary;

Mean: E(Y)t = ų (2)

Variance : var(Y)t = E(Yt – ų)2 = 2

Covariance : Yk = E[(Yt – ų) (Yt+k – ų)]

where Yt is the time series.

Under the non-stationary process zero mean and constant variance is

expected. A test of non-normality is then done using the Bera-Jarque method or

to determine whether t are normally distributed. The test will support the

outcome of the ADF unit root test.

~ 8 ~

3.4 Methodology – Hypothesis b) Risk Adjusted Returns

Hypothesis b) tests for risk-adjusted abnormal returns using the Sharp-Linter-

Mossin model based on linear regression, otherwise known as the capital asset

pricing model (CAPM). This follows Aggrawal & Soenen (1988) we use daily

return series rather than convert into monthly returns. The market return is

based on the S&P500 (ETF SPY) and the risk free rate is represented by the US

90 day Treasury bill rate. The beta coefficient (β) measures systematic risk. The

market return (Rm) is represented by the ETF SPY, the broad based S&P 500. An

aggregate of the market being the return on exchanges NYSE, AmEx, NASDAQ

and Arca was also tested in the model to represent (Rm), the outcomes were

comparable to that when using SPY. The CAPM is expressed as;

CAPM: (3)

(RGLD - Rf)t = + β (Rmt − Rf)t + ųt

(RSLV - Rf)t = + β (Rmt − Rf)t + ųt

A consistent risk adjusted return by the applicable gold and silver ETF that

exceeds the market will show the presence of abnormal returns for these

investment vehicles. A Whites test (1980) is performed to ensure constant

standard errors in the coefficients, or that homoskedasticity is present, thus

proving the second assumption of the classic linear regression model (CLRM)

holds;

CLRM Second Assumption: (4)

var(t) =2 and <

3.5 Methodology – Hypothesis c) Filtered Trading Rules

We then use filter trading rules to test for profitable trading strategies. Filter

trading is a strategy determining by percentage movement of the stock price at

which points in time investors should buy or sell stock, or in other words what is

the entry and exit point of the trade, using 2% price movements as buy/sell

signals. This was applied retrospectively to GLD and SLV to ascertain abnormal

returns for the period December 2006 to December 2009. Testing is not

extended to the smaller funds. The benchmark is the stock performance of a

~ 9 ~

passive buy and hold strategy rather than the market return or SPY. The

starting point for the filer will follow whichever movement comes first, either

upward or downward. Log returns are used and applied to filters at 3% and 5%

in regression analysis against the market, following Solt & Swanson (1981).

4 Results

4.1 Preliminary Analysis

We plotted real gold and silver versus GLD and SLV to provide an indication of

the relative performance of each investment over the same time horizon, using

a 100 at 19 November 2004, and an end date at 31 January 2009.

Graph 1 about here

Graph 1 indicates that the movement between GLD and gold is quite close with

very little separation between the two trend lines particularly since August

2008, and coinciding with a steady trend in appreciation of the asset.

Graph 2 about here

Graph 2 shows similar results for SLV and silver, with no significant points of

deviation between the two return series. Comparing the time series of GLD

versus the market, shows GLD has outperformed the market benchmark,

represented by SPY especially since August 2007. A comparison of SLV versus

the market (SPY) shows a smaller outperformance. SLVs performance began

from a lower base relative to the market and comparative to GLD and was not

as resilient when systematic risk increased. In fact SLV found support at a

similar time to the market and began an upward trend with much larger

oscillations.

Graph 3 about here

Graph 4 about here

4.2 Analysis and Results - Hypothesis a) Abnormal Returns, Stationary Testing

Graphical analysis provided a starting point to the analysis, offering a feel for

the nature of the data and indication whether stationarity or autocorrelation is

present before more formal tests are performed. The trend for the GLD

~ 10 ~

distribution offered no indication of a random walk with or without drift, or a

process that is non-stationary. Rather the process for the GLD distribution is

more consistent with white noise, or a stationary process that has a relatively

even distribution over time. The same inference can be drawn for the SLV

distribution.

The graphed series also indicated that negative autocorrelation may exist

given the pattern where the residuals appear to cross the axis on a frequent

basis that would be unusual for a random walk distribution. Detecting a

autocorrelation relationship suggests that there is some inefficiency in the

market and that successive price changes are not independent

The Autocorrelation Function (ACF) is the first important formal test of the

hypothesis and is in itself a relatively simple test for detecting dependence in

the series. The test is expressed as;

ACF: (5)

pk = Yk / Y0

= covariance at lag k / variance

Plotting pk against k will produce the correlogram although determining the

appropriate lag length in the equation is important when performing the test

and requires a trial and error process. Akaike and Schwarz information criterion

has been chosen as the preferred method to assess the appropriate amount of

lags.

Lags were tested from 5 to 35 at intervals of five and the model offering the

lowest value in these criteria selected. The lowest value was achieved at 5 lags

where it gave Akaike Schwarz information values at -5.49 and -5.45. 5 lags

were then applied to both the GLD and SLV ACF and correlogram.

Figure 1 about here

The results for show GLD shows all lags are close to ‘0’, so GLD exhibits a

typical stationary process. The AC range is between -.033 and .021, and the

PAC range is the same, there is very little difference from zero.

Figure 2 about here

~ 11 ~

The correlogram for the SLV distribution shows AC and PAC values are

insignificantly different from zero across all lags. SLV exhibits stationarity.

The analysis is now extended to the augmented Dickey-Fuller (ADF) test which

is also known as the unit root test. The ADF test is suitable for larger sample

consists of the following regression estimate;

ADF: (6)

m

i

ttittt iYYY1

21 1

The ADF test is one sided and offers three critical values at 1%, 5% and 10%

in comparison to a test statistic. When performing the test lags are applied to

ensure that the error terms are uncorrelated and that unbiased estimates can

be obtained preventing the test from to easily accepting whether a unit root is

present.

Figure 3 about here

The test results for the GLD distribution are shown in Figure 3. The ADF for the

GLD logarithm is performed with 5 lags applied to the dependent variable. The

ADF test-statistic does not exceed the critical values at any level and has

determined that the series is stationary; it does not have a unit root.

When extending the ADF test at intervals of five to include up to 35 lags the

result in terms of the ADF statistic relative to the critical values was consistent

which we would expect. The bulk of the coefficients produced at various lags are

also shown to be negative implying a relationship exists between price changes.

The same tests are now performed on the series for SLV. The results suggest

the null hypothesis should also be rejected. The series is stationary with zero

mean.

Figure 4 about here

However, the SLV series is differentiated from the GLD series as even though

the majority of coefficients are less than zero; the amount is fewer than those

applicable to GLD as more lags are applied.

The ADF result is consistent with that obtained by Solt & Swanson (1981) that

the rejection of a unit root in silver is slightly less obvious than that in gold, or

~ 12 ~

applied to this model that the rejection of a unit root in SLV is slightly less

obvious than that evident in GLD. The implication is that not all available

information is reflected in the prices for GLD and SLV as would the case if the

market were truly efficient instead of weak form efficient.

When expanding the ACF, Correlogram and ADF test to funds IAU, DBS, DGL

and AGQ, the tests produced comparable results to those obtained from GLD

and SLV. The logarithmic distribution was found to possess no unit root. As with

GLD and SLV the outcome did not change when increasing or decreasing lags.

The findings held across all the four smaller funds although the number of

observations included in the analysis was in most instances lower than those in

the test of GLD and SLV, with AGQ having the least number of observations at

265. The null hypothesis tested in this case that the residuals have a unit root

was rejected. This suggests that market for gold and silver ETFs is inefficient.

For the purpose of forecasting and predicating price a non-stationary process

is of little value to investors. It only holds for the period for which it was

studied.

4.3 Analysis and Results - Hypothesis a) Abnormal Returns, Normality Testing

To complete the analysis of hypothesis a) a test of normality is undertaken and

applied to each fund. The null hypothesis tested in this case is that the residuals

(ųt) are normally distributed; the alternate hypothesis is that they are non-

normally distributed. Evidence of a non-normal distribution would identify

dependence in the series, or that the series is weak form efficient. Such

information is relevant to investors because it will influence buy and sell

decision making as the investor attempts to predict when to enter or leave the

market. As the data for all funds in the analysis has been transformed into

logarithms and is based on a daily series, it has essentially had the effect of

rescaling the data and pulling in extreme observations hence filters will not be

applied to the test. It is also not necessary to filter in the cases of GLD, SLV and

IAU due to the large amount of observations which encourages a normal

distribution.

Table 2 about here

~ 13 ~

The summary statistics for each fund are available in Table 2 and include

values for mean, median, skewness, kurtosis and the Bera-Jarque statistic. An

examination of the summary statistics across all funds shows the mean and

median for GLD to be the closest, at .0005 and .0008 respectively, suggesting

there is a presence of normality in the series which is typical where the mean is

close to zero. The mean and median for IAU, DGL and AGQ are also quite close

and in some ways similar to GLD, indicating normality as there is not significant

differences between each mean and median value. The variation between the

mean and median values for IAU and DGL is both at .0004 whereas AGQ is at

.004. Conversely, the mean and median for SLV are comparatively further apart

at .0002 and .003, as is that for DBS at .0003 and 003. Both of these funds

trade silver rather than gold and have an initial indication of non-normality.

The table also shows the Bera-Jarque statistic, and the measures of skewness

and kurtosis. Aggarwal & Soenen (1988) found strong kurtosis for the gold

price. We found strong positive kurtosis above the critical threshold of ‘3’ for all

funds except AGQ., with a range between 8.310 for IAU at the lower end to

13.48 for DBS. The primary funds GLD and SLV offer kurtosis values of 7.5385

and 9.9616. The exception to the strongly positive kurtosis identified in the

funds is silver fund AGQ which is below ‘3’, though this is based on fewer

observations. These results suggest dependency exists in the series for GLD,

SLV, IAU, DGL and DBS.

The Bera-Jarque is an asymptotic test based on OLS residuals. The test first

calculates skewness then kurtosis testing whether the coefficients are jointly

zero (Brooks, 2002). The norms of skewness and kurtosis are 0 and 3. The test

statistic is given by;

Bera-Jarque Test: (7)

BJ = n

S2 + (K – 3)2

6 24

The Bera-Jarque output statistics are extremely high and very different from

zero for all funds excluding AGQ. DBS has the highest Bera-Jarque value at

3608 while the lowest excluding AGQ is DBS at 642, comparatively GLS and SLV

give values at 797 and 2026. This value of these statistics means the skewness

~ 14 ~

and kurtosis statistics is not jointly zero suggesting the null of normality should

be rejected.

The p-value of the Bera-Jarque statistics is at the same time very low which

we would expect; it is zero for all funds except AGQ. To not reject the null of

normality the p-value should be larger than .05 at the 5% test level (Brooks,

2002). As the p-value is below .05 normality is conclusively rejected based on

these outputs particularly given that the number of observations is large, at

least for funds GLD, SLV, IAU and DBS. Comparatively the p-value for AGQ is

greater than .05, meaning normality would not be rejected and fitting with the

low Bera-Jarque value of 1.36.

Reviewing the residuals provides method to understand the shape of a

distribution. The curve of the histograms under the formal Bera-Jarque test is

representative of the frequency distribution and is consistent with that of a

leptokurtic rather than a normal distribution. For GLD, SLV, IAU, DGL and DBS

the distribution is shown to be high peaked and steep, so the distributions are

not perfectly normally distributed. The curve for AGQ is conspicuous given is

has a bell shaped curve more consistent with a normal distribution, although

the number of observations in the series is somewhat fewer than those of other

funds.

The existence of non-normality in the GLD and SLV series seems apparent; it

establishes that GLD and SLV ETF prices behave like actual gold and silver

prices giving some assurance around the funds ability to offer an abnormal

return and consistency with prior studies into the distribution characteristics of

the physical assets. The null is likewise rejected for funds IAU, DGL, DBS, but

not for AGQ, although it seems logical that as the AGQ series extends from its

current short base it will experience non-normality as the other funds have

done.

4.4 Analysis and Results - Hypothesis b) Risk Adjusted Returns, CAPM

It is not enough to purely suggest abnormal returns are present in gold and

without also considering the return within a risk return framework. The Capital

Asset Pricing Model (CAPM) is the chosen tool with which to assess and explain

the relationship as to whether GLD and SLV maintain the ability to offer

investors an above market (expected) return relative to the risk of investing in

the funds. The CAPM is a multivariate linear regression using least squares and

~ 15 ~

when performed in this analysis uses the logarithmic return series for GLD and

SLV, both these series and the series of other funds used in the analysis are

represented by (Ri).

Figure 5 about here

The results for GLD are shown in Figure 5. The slope coefficient, which is the

beta coefficient at 0.6095 is less than one, suggesting that the security moves

less than the overall market and is not volatile, it is rather defensive in nature

and suggests GLD would have some risk reducing properties in a portfolio of

stocks. The intercept term at -0.0091 is not significantly different from zero.

This means there is no significant risk adjusted abnormal return for GLD for the

time period.

When applying sub-period analysis to GLD, the horizon of 1 May 2006 through

1 May 2007 are analyzed to assess whether in its relatively immature state if

GLD demonstrated greater inefficiency. The beta coefficient over horizon is

0.5721, marginally less than the beta over the three years showing that GLD

was slightly less risky over this period. The intercept is -0.0211, meaning that

GLD did not offer a risk adjusted abnormal return for the period.

The horizon 1 March 2008 to 1 March 2009 is also tested, this will identify

whether the rapidly increasing systemic risk produced an abnormal return if

investing in GLD during the period of the financial crisis. The beta coefficient for

the period showed a remarkable difference from the three years or one year

immaturity period being 0.0491, meaning that GLD was highly defensive when

market high risk was greatest.

This conclusion is similar to Aggarwal & Soenen (1988), though, it is important

to point out that the asset, GLD in our case, has appreciated significantly during

the period used in the analysis offering investors a strongly positive return on a

simple passive or buy and hold strategy.

Figure 6 about here

The results SLV are in Figure 6 and are similar to those of GLD, with a low beta

at 0.06503 and intercept at -0.0084. The same conclusions can be drawn that

the return does obviously display the characteristics of inefficiencies when risk is

~ 16 ~

considered. The same quantity of daily logarithmic observations, 922 is used as

that used in the GLD CAPM regression.

The SLV beta coefficient is less than one, evidence that the asset is not

particularly volatile and is therefore risk reducing in nature within a portfolio.

This would allow the investor to purchase other more risky assets with a higher

beta and still maintain an acceptable level of chosen risk by including SLV (or

GLD) to help balance the basket of assets. The model has a minor negative

intercept coefficient at -0.0084 signifying that investors into SLV actually earned

a rate less than the market requires. The same is true of course for GLD, and

like GLD, SLV has experienced notable appreciation in underlying asset values

during the GFC although not to the same extent.

When applying sub-period analysis to SLV, the horizon of 1 May 2006 through

1 May 2007 are analyzed to assess whether in its relatively immature state if

SLV exhibited greater inefficiency. The beta coefficient over horizon is 0.4732,

noticeably lower than the beta over the three years showing that SLV was a less

risky investment over this period. The intercept is -0.0084, meaning that SLV

did not offer a risk adjusted abnormal return for the period.

The horizon 1 March 2008 to 1 March 2009 is also tested, this will identify

whether the significant increase in systemic risk produced an abnormal return if

investing in SLV during this horizon. The beta coefficient for the period was

much lower than the three years or one year immaturity period being 0.2844

and suggesting that SLV represented a low risk when market risk was greatest.

What is unexpected when comparing the regression results across all horizons

tested is just how close the SLV intercepts and slope values are to GLD,

especially given the differences in the supply and demand characteristics of the

underlying assets. Another point of difference between the funds is the

enormous disparity between market capitalizations, with GLD $35 billion USD

larger.

As with GLD the standard errors under the SLV CAPM regression offer certainty

around the model as they are reasonably small at .0277. R2 at 0.2791 implies

the model fits the data slightly better than that for GLD and that it explains the

return variability. Nonetheless the generic outcome of the CAPM analysis is that

market inefficiencies are not evident on a risk adjusted basis.

~ 17 ~

Applying the CAPM to IAU, DGL, DBS and AGQ offers a generally consistent

result to that obtained when completing the CAPM regression for GLD and SLV,

one of low market inefficiency.

The supporting regressions to some extent offer an additional element of sub-

period analysis given the shorter listing time frames available for gold backed

DGL and silver back DBS and AGQ. The shorter timeframes for these funds can

be identified by the lesser amount of daily observations in the regressions at

389, 750 and 267 respectively. In terms of actual time horizon these

observations reflect periods starting January 2007, June 2008, December 2008

and ending at 31 December 2009. The data in these sets is likewise converted

in to a logarithm for consistency with the method applied to the GLD and SLV

CAPM.

The beta coefficient of the IAU, DGL, DBS and AGQ beginning at lower end is

in the range of 0.6053 applicable to the larger fund by capitalization IAU, and

0.6625 for the smallest fund by capitalization DBS. The intercept is in the range

of -.0092 for IAU and .0024 for AGQ. With the exception of the AGQ intercept,

the results are not overtly dissimilar from that produced in the regressions for

the GLD and SLV. What is unusual is the very low beta applicable to gold fund

DGL. At 0.1143 it is noticeably lower than that of the other funds and evidence

of extraordinarily low relative volatility, in terms of market cap the fund is much

smaller than GLD only $159.03 million USD.

The intercept is slightly positive on AGQ, whereas in all other tests the

intercept is negative. This indicates AGQ investors earned slightly more than the

risk associated with the investment would require whereas investors in the

other funds earned slightly less. The value may be impacted by the fact that the

fund is relatively immature.

In summary application of the CAPM regression to gold and silver ETF’s GLD,

SLV, IAU DGL, DBS, essentially gave the same outcome as that drawn by

Aggarwal & Soenen’s investigation into the physical asset, that the there is no

indication of market inefficiency when adjusted for risk. In real terms this is

interpreted as non existence of abnormal returns, meaning the null hypothesis

of efficiency is accepted. What is interesting is that as investors have placed

increasing value on investment in gold and silver which is reflected in the

demand, the lack of a risk adjusted return is significant, it suggests investors

~ 18 ~

are willing to accept a return at a level below what the market would expect as

they seek to access the risk reducing properties.

4.5 Analysis and Results - Hypothesis b) Risk Adjusted Returns,

Heteroskedasticity

We then tested for heteroskedasticity or whether the variance of the error

terms is constant. Solt & Swanson (1988) discovered the presence of

heteroskedasticity in physical gold and silver series.

Detecting heteroskedasticity though is usually not easy and often a matter of

speculation (Gujarati & Porter, 2003). The price movements shown in Graphs 1

and 2 show strong movements over time, particularly for GLD, that suggests

the series may actually be affected by homoskedasticity, that the errors are

constant, the antithesis of heteroskedasticity. Graphical analysis though is not a

sufficient test hence the application of Whites (1980) test as will be

demonstrated. The use of Whites test differs from the approach taken by Solt &

Swanson (1981) who employed the Goldfield-Quandt test.

As price/return variability is expected to be a factor throughout the past three

years due to the economic climate and may affect the observations and create

outliers, in order to perform the test analysis logarithms are taken of each

series. Solt & Swanson experienced similar variability with the steeped change

in gold and silver pricing throughout the 1970’s and also used logarithms.

Outlying observations are a frequent driver of heteroskedasticity.

Whites (1980) test does not rely on the normality assumption and is a

conservative test. It is expressed as the following auxiliary regression;

Whites (1980) Test: (8)

viiiU XXXXXX iiiii

326

2

5

2

2433221

2

3

Disproving heteroskedasticity improves the validity of the analysis by ensuring

the OLS estimates give unbiased and consistent coefficient estimates (Brooks,

2002). However, as prior testing of physical gold and silver series behavior by

Solt & Swanson encountered the presence of a non-constant finite variance, it is

reasonable to expect the same phenomenon may occur in the ETF series given

the equivalent underlying properties of the asset.

~ 19 ~

Figure 7 about here

Figure 8 about here

The test results derived from the application of Whites (1980) test are

illustrated in Figures 7 and 8. Both outputs have a large critical value at 98.1 for

GLD and 32.1 for SLV. The implication of this test output is that it is likely the

two series possess a non-constant finite variance, especially as the given p-

value is zero. Note that heteroskedasticity is still present despite the conversion

of the series in to logarithms. For GLD and SLV the null hypothesis of constant

variance of the error terms is rejected.

IAU also evidences a very high critical value at 98.92 and a zero p-value, this

suggests the presence of heteroskedasticity. The behavior of the series is

essentially the same as that of GLD, which is not unexpected given the

underlying properties, maturity and number of observations that are all

consistent with GLD and that similar test result between the funds has been a

regular trait in the prior analyses. DGL, DBS and AGQ offer far lower critical

values that GLS, SLV and IAU, with the highest of these lower values being AGQ

at 14.73 and lowest being DBS at 1.68. The result is interesting because DBS

has a large number of observations at 750, which although less than GLD, SLV

and IAU, would generally give an expectation that the fund would provide as

similar outcome to those just mentioned yet it strongly suggests the presence

of homoskedasticity, so the null hypothesis of homoskedasticity would be

accepted for DBS.

Meanwhile DGL offers a critical value at 12.61 which would also suggest the

null is accepted for this fund. What is obviously different of the DGL, DBS and

AGQ series compared to GLD, SLV and IAU is that the p-values are above zero.

Particularly, the p-value for DBS which at .4304 relative to the other funds DGL

and AGQ at .0022 and .0006 is quite large. The p-value is important to note

because it indicates that randomness could explain the output whereas there

was not ability for randomness to explain the output for the funds.

The null hypothesis tested in the instance is that the variables have a constant

finite variance (homoskedasticity); the alternate hypothesis is that the variance

is non-constant (heteroskedasticity). Analysis of Whites test allows the null

hypothesis to be accepted for the funds DGL, DBS and AGQ only and rejected

for GLD, SLV and IAU. The funds accepting the null have far fewer observations

~ 20 ~

than those applicable in Whites test for GLD, SLV and IAU, indicating that over

shorter periods the presence of heteroskedasticity is not existent or at least it is

less evident and more difficult to detect.

4.6 Analysis and Results – Hypothesis c) Filtered Trading Rules

Note that when interpreting the results it is important to remember the large

increase in gold prices during the study period (2006-2009) and wider

macroeconomic factors caused by the Global Financial Crisis, which would have

increased systematic risk and induced a flight to gold and to a lesser extent

silver as investors sought to diversify assets. By December 2009 gold had at the

time reached new highs after earlier exceeding the $1,000 per ounce mark that

once appeared unobtainable. The trend is common to that of the gold

appreciation during the 1970’s ‘stagflation’ identified in Solt & Swanson’s (1981)

work. The two periods are both times of increased liquidity in gold.

An investor employing a passive (buy and hold) strategy could have easily

made large returns simply due to the wider factors mentioned. However we wil

test whether there was an opportunity to generate an even greater return

relative by active filter trading or momentum strategies.

Aggarwal & Soenen (1988) found that under Fama & Blume’s (1973) filters

gold performed poorly and at no point did the return from filter trading exceed

what the investor could have obtained via a passive strategy. Solt & Swanson

(1988) found that on a gross basis applying filters to gold and silver likewise

provides a poor result and that the investor would have been better to employ

the buy and hold strategy. Solt & Swanson note that the result did not differ

when managing for transaction costs and concluded there appeared to be no

inefficiency to be exploited. Comparatively, Fama & Blume (1973) used 24

different filters finding that filters were inferior in almost all cases even when

ignoring transaction costs.

We tested market inefficiency through filter trading at 3% and 5% levels.

Given that a filter of 7%, 10% or up to 50% used by the other authors seems

extreme for a daily series these will not be tested. It would be unusual for a

stock to move 7%, 10% or 50% in one day. It has happened of course but the

infrequency means the filter is unsuitable for a practical test of a daily series.

Even over a month a movement 10% would equate to substantial market

~ 21 ~

movement, although Aggarwal & Soenen apply this filter to their monthly series

as does Solt & Swanson to their weekly series.

Undoubtedly during the period under consideration the affects GFC have

caused large downward and upward movements in values, nonetheless the

study stresses that these are less important when based on a daily series and

considers 3% and 5% more appropriate. The filters are applied to both upward

and downward movements. The series is converted to logarithms excluding

transaction costs which are ignored in the analysis. Note that as the funds are

based on capital growth a divided is not applicable in the analysis.

Table 3 about here

The results of filter trading applied to GLD and SLV are shown in Table 5. The

start point is a ‘sell’ for both funds. The results show for GLD that at the 3%

and 5% level filtered trading offers a better return than that applicable to buy

and hold or passive investing. The same result is shown when filters are applied

to SLV.

Out results differ from prior studies as filters marginally exceed the buy and

hold strategy. The filtered return implies that an investor can generate a return

greater than that available by having no trading strategy. The result is unique

as it differs from that applicable to physical gold and silver returns when filters

are implemented and is of value to investors seeking to trade the gold and

silver ETF market.

5 Conclusion

Study has examined the series of three gold-backed and three silver-backed

ETFs and tested their return performance against existing knowledge of physical

gold and silver behavior. The primary focus of the study was the largest gold

and silver funds by AUM being GLD and SLV with an extension of the tests

applied to IAU, DGL, DBS and AGQ.

The tests evidenced a remarkable consistency between the return properties of

the ETFs and the physical assets of gold and silver identified in prior studies.

The log of GLD and SLV is shown to be consistent with a stationary, white noise,

distribution, or that the market for GLD and SLV are inefficient and that prices

do not reflect all available information. Non-normality was also proven for all

funds except AGQ although some caution should be exercised given that the

~ 22 ~

funds are still maturing meaning inefficiency is not unexpected. This information

is important to investors as it shows that an abnormal return can be achieved at

today.

CAPM analysis revealed an abnormal return does not exist when considered in

a risk-return framework. For investors this means that when risk is factored

that cannot be assured to exceed the market’s risk adjusted performance.

Application of filter trading rules suggested a marginal above market return is

possibly at the 3% and 5% filter level, implying that an investor using a trading

strategy has a small opportunity to outperform a passive investor.

In summary the same fundamental behavior applicable to physical gold and

silver returns also applies to GLD and SLV exchange traded fund prices/returns.

Specifically, their price movements do not follow a random walk. However, we

show that such inefficiency which was not exploitable on physical gold and silver

in the past now provides an opportunity for abnormal returns through a simple

filter trading rule. This finding is especially important in lights of David

Swensen’s Yale model. Gold and silver EFTS may serve as the other asset class

which is particularly worthwhile in the portfolio as they are inefficient (and thus

profitable) and liquid, the best of both worlds. Our finding is also valuable to

investors as the markets interest in gold and silver ETFs has made the metals

tangible substitutes to other more traditional securities and is unlikely to abate

in the near future. As the ETF vehicle and especially interest in gold continues to

grow in an exponential manner undoubtedly more investigation into funds

performance will be of interest.

+++++++++++++++++++++

References

Aggarwal, R., and Soenen, L., (1988). “The nature and efficiency of the gold market”,

Journal of Portfolio Management, 1988, Spring, Vol 14, no.3 page 18 to 21

Anderson, S., Born, J., and Schnusenberg, O., (2010). “Closed-End Funds, Exchange

Traded Funds and Hedge Funds”. Springer: New York.

Birkner, C., and Collins, D., (2008). “Futures: News, Analysis & Strategies for Futures,

Options & Derivatives Traders”; August 2008, Vol. 37 Issue 9, p54-61, 3p

Brooks, C. (2002) “Introductory Econometrics for Finance”. Cambridge Univ. Press

~ 23 ~

Cohn, L., (2010). “It’s Not Just Gold That Glitters”. Kiplinger's Personal Finance;

Apr2010, Vol. 64 Issue 4, p49-49, 1p

De Jong Jr. J., and Rhee, S., (2008). “Abnormal returns with momentum/contrarian

strategies using exchange traded funds”. Journal of Asset Management; Dec2008, Vol.

9 Issue 4, p289-299, 11p

Delcoure, N., and Zhong, M., (2007). “On the premiums of iShares”. Journal of

Empirical Finance 14: 168-195.

Engle, R., and Sarkar, D., (2006). “Premiums-discounts and Exchange Traded Funds”,

Journal of Derivatives, 13, 26-47.

Fama, E., (1970). “Efficient Capital Markets: A Review of Theory and Empirical Work”.

Journal of Finance, May 1970, pp. 383-417

Fama, E., and Blume, M., (1973). “Filter Rules and Stock Market Trading”. Journal of

Business, April 1973, pp 226-261.

Fuhr, D., (2010). “Global ETF Assets Breakthrough $1 Trillion Milestone” Available from

http://uk.ishares.com/publish/repository/in_the_media/en/downloads/uk_press_71.pd

f accessed on April 2 2010, BlackRock Inc.

Hegde, S., and McDermott, J., (2004). “The Market Liquidity of Diamonds, Q’s and their

Underlying Stocks”. Journal of Banking and Finance, 23, 1043-1067.

Kelleher, E., (2010). “Routes to lower management charges”, Financial Times, February

26 2010.

Jarecki, H., (2007) “Commodities create the right mix”, Futures Magazine, February

2007

Miffre, J. (2007). “Country-specific ETFs: An efficient approach to global assets

allocation”. Journal of Asset Management 8(2): 112-122.

Murdock, M., Richie, N., (2008). “The United States Oil Fund as a hedging instrument”,

Journal of Asset Management, Volume 9, Number 5, December 2008 , pp. 333-

346(14) Palgrave Macmillan

Murphy, Rob and Wright, Colbrin, (2010) An Empirical Investigation of the Performance

of Commodity-Based Leveraged ETFs Available at SSRN: http://ssrn.com/abstract

=1668614

Nichols, J., (2010). “Silver underperforming”, Coin World, 8/2/10, Vol. 51 Issue 2600,

p38-38

Richie, N., and Madura, J., (2007). “Impact of the QQQ on Liquidity and Risk of the

Underlying Stocks”. Quarterly Review of Economics and Finance, 47, 411-421.

Solt, M., and Swanson, P., (1981). “On the Efficiency of the Markets for Gold and

Silver”, Journal of Business, vol. 54 no. 3 p453-478

http://www.ingentaconnect.com.ezproxy.massey.ac.nz/content/pal/jam;jsessionid=370tqblgq6866.alice

http://www.ingentaconnect.com.ezproxy.massey.ac.nz/content/pal;jsessionid=370tqblgq6866.alice

~ 24 ~

Appendix

Tables:

Table 1: Commodity ETFs Used in the Data Set

Table 1 lists the selected funds and the benchmark SPY by; ticker, name, funds family,

investment asset (either gold or silver), total net assets (USD), inception and the exchange the

fund is traded on (note the funds are often not limited to trading on one exchange).

Summary Table: Commodity ETFs Used in the Data Set

Ticker Name Fund Family Investment

Asset

Total

Net Assets

USD

Inception Date

Exchange

GLD

SPDR

Gold Shares

SPDRs Gold $40.5bn Nov

2004 NYSE Arca

IAU Comex Gold Trust

iShares Gold $2.75bn Jan

2005

NYSE

Arca

DGL DB Gold PowerShares Gold $159.03

m Jan

2007 NYSE Arca

SLV Silver Trust

iShares Silver $5.22bn Apr

2006 NYSE Arca

DBS DB

Silver PowerShares Silver $62.31m

June 2008

NYSE Arca

AGQ Ultra

Silver ProShares Silver

$171.23m

Dec 2008

NYSE Arca

Benchmark Equity ETF

SPY SPDR

S&P 500 SPDRs Equities $77.82bn 1993

NYSE Arca

(NYSE Arca, 2010)

Graph 1: GLD Return vs. Gold Return

GLD Return vs. Gold Return

0

50

100

150

200

250

300

19/1

1/2

004

19/0

2/2

005

19/0

5/2

005

19/0

8/2

005

19/1

1/2

005

19/0

2/2

006

19/0

5/2

006

19/0

8/2

006

19/1

1/2

006

19/0

2/2

007

19/0

5/2

007

19/0

8/2

007

19/1

1/2

007

19/0

2/2

008

19/0

5/2

008

19/0

8/2

008

19/1

1/2

008

19/0

2/2

009

19/0

5/2

009

19/0

8/2

009

19/1

1/2

009

GLD Return Gold Return

~ 25 ~

Graph 2: SLV Return vs. Silver Return

Graph 3:

Graph 4:

Figure 1: GLD Correlogram

SLV Return vs. Silver Return

0

50

100

150

200

250

300

19/1

1/2

004

19/0

2/2

005

19/0

5/2

005

19/0

8/2

005

19/1

1/2

005

19/0

2/2

006

19/0

5/2

006

19/0

8/2

006

19/1

1/2

006

19/0

2/2

007

19/0

5/2

007

19/0

8/2

007

19/1

1/2

007

19/0

2/2

008

19/0

5/2

008

19/0

8/2

008

19/1

1/2

008

19/0

2/2

009

19/0

5/2

009

19/0

8/2

009

19/1

1/2

009

SLV Return Silver Return

GLD Return vs. The Market and SPY

0

50

100

150

200

250

300

19/1

1/2

004

19/0

2/2

005

19/0

5/2

005

19/0

8/2

005

19/1

1/2

005

19/0

2/2

006

19/0

5/2

006

19/0

8/2

006

19/1

1/2

006

19/0

2/2

007

19/0

5/2

007

19/0

8/2

007

19/1

1/2

007

19/0

2/2

008

19/0

5/2

008

19/0

8/2

008

19/1

1/2

008

19/0

2/2

009

19/0

5/2

009

19/0

8/2

009

19/1

1/2

009

GLD Return Market Return SPY Return

SLV Return vs. The Market and SPY

0

20

40

60

80

100

120

140

160

26/0

4/2

006

26/0

6/2

006

26/0

8/2

006

26/1

0/2

006

26/1

2/2

006

26/0

2/2

007

26/0

4/2

007

26/0

6/2

007

26/0

8/2

007

26/1

0/2

007

26/1

2/2

007

26/0

2/2

008

26/0

4/2

008

26/0

6/2

008

26/0

8/2

008

26/1

0/2

008

26/1

2/2

008

26/0

2/2

009

26/0

4/2

009

26/0

6/2

009

26/0

8/2

009

26/1

0/2

009

26/1

2/2

009

SLV Return Market Return SPY Return

~ 26 ~

Figure 2: SLV Correlogram

Figure 3: GLD ADF Test

Figure 4: SLV ADF Test

~ 27 ~

Table 2: Summary Statistics

Figure 5: GLD CAPM

~ 28 ~

Figure 6: SLV CAPM

Figure 7: IAU CAPM

Figure 8: DGL CAPM

~ 29 ~

Figure 9: DBS CAPM

Figure 10: AGQ CAPM

Figure 7: Whites Test GLD

~ 30 ~

Figure 8: Whites Test SLV

Figure 13: IAU

~ 31 ~

Figure 14: DGL

Figure 15: DBS Figure 16: AGQ

~ 32 ~

Histogram 1: RGLD Histogram 2: RSLV

Histogram 3: RIAU Histogram 4: RDGL

~ 33 ~

Histogram 5: RDBS Histogram 6: RAGQ

Table 3: Filtered Trading Results

Buy and Hold Vs. Filtered Trading Average Return

~ 34 ~

Fund Period

3% Filter 5% Filter

Buy and

Hold Filtered

Buy and

Hold Filtered

GLD

1 May 2006 –

31 Dec 20009 -0.000118 0.001642 -0.000118 0.000864

SLV 1 May 2006 –

31 Dec 20009 0.0000 0.006430 0.0000 0.005630

Abnormal Returns in Gold and Silver Exchange Traded...

Documents

Transcript of Abnormal Returns in Gold and Silver Exchange Traded...