Pairs Trading: An implementation of the Stochastic Spread and Cointegration Approach

University of Amsterdam

Master Thesis

Pairs Trading:

An implementation of the Stochastic Spread and

Cointegration Approach

Author:

Nick Huurman

5631335

Supervisors:

Prof. dr. C.G.H. Diks

Dr. S.A. Broda

August 10, 2012

Contents

1 Introduction 1

2 Cointegration approach 3

2.1 Integration, cointegration and error correction . . . . . . . . . . . . . . . . . . . . 3

2.2 Theoretical framework for pairs trading . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Johansen cointegration test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Stochastic spread model 7

3.1 The state-space model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Trading design 12

4.1 Trading period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Pairs selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.1 Cointegration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.2 Stochastic spread approach . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 Mean-Variance optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Evaluation 16

5.1 Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2.1 Stochastic Spread Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2.2 Cointegration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.3 Results using DAX index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Conclusion 24

i

Chapter 1

Introduction

History shows us that using a market neutral trading strategy can be a good way to invest your

money. Typically, such a strategy performs in a steady manner, regardless of whether the market

goes up or down, and returns come with low volatility (Vidyamurhty, 2004). These favourable

characteristics are achieved by trading a market neutral portfolio, which can be constructed by

going long and short in two assets that have the same beta (hence, a portfolio with zero beta),

which is also referred to as a spread portfolio.

This thesis will evaluate one particular market neutral trading strategy that has already been

used (and proved its value) for 25 years on Wall Street, namely pairs trading. Recent studies

tell us that pairs trading performs exceptionally well in turbulent markets, where mispricing of

stocks is more common (Gatev et al., 2006; Do et al., 2006; Baronyan et al., 2010). Baronyan

et al. (2010) even reported a 40 per cent net annual profit in the first year (2008) of the financial

crisis. This result shows that pairs trading, despite its 25-year existence, is still profitable and

therefore very relevant to investigate, especially with the recent turbulent stock market.

The concept of pairs trading is relatively simple and can be summarized as follows. To begin,

an investor has to find two securities of which the prices have historically moved together and are

therefore in a ”relative equilibrium”. Then, when the price di↵erence between the two securities

widens, hence the securities are out of the relative equilibrium, the trader takes a long position

in the cheap security and a short position in the expensive security. Based on the past price

dynamics, the expectation of the investor is that the prices will converge back to their relative

equilibrium. If so, the long and short position are unwound and a profit is made.

The main di�culties of constructing a profitable pairs trading strategy lie evidently in using

the right method for selecting a suitable pair of securities and how and when to take a position

in the selected pair. A recent thesis by Yakop (2011) investigates a broad range of selection and

trading methods which are appropriate for pairs trading. He concludes that the model based

approaches perform best. Therefore, this thesis will investigate and analyse two di↵erent model

based approaches for pairs trading. The first approach is the cointegration approach, which is

1

CHAPTER 1. INTRODUCTION 2

based on the error correction model. The second method is the ”stochastic spread” approach,

as introduced in Elliot et al. (2005).

The results of the selected pairs of both methods will be calculated with the use of two

di↵erent trading strategies. The first is a dynamic model for the number of positions taken in

the spread that is based on the mean-variance optimization procedure discussed in the paper of

Markowitz (1952). The second is the two-standard deviation approach, which is commonly used

in earlier literature (Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). The main objective

for this thesis is to compare the performance of the cointegration approach with the stochastic

spread approach when implemented with the two aforementioned pairs trading strategies.

The thesis is organized as follows. Chapters 2 and 3 give an outline of the two di↵erent

approaches used for modelling the behaviour of a pair. In the 4th chapter the di↵erent trading

strategies will be described and Chapter 5 provides an evaluation of the results of both models

with the di↵erent trading strategies. The last chapter contains the conclusions of this thesis.

Chapter 2

Cointegration approach

The notion of cointegrated time series was first introduced by Engle and Granger (1987) and

is one of the ideas for which they received a Nobel Prize in economics in 2003. Cointegrated

time series possess characteristics that are very useful for pairs trading, such as a long-term

equilibrium with the associated property of mean reversion. In the first section of this chapter

the definitions of integration, cointegration and the error correction model (ECM) for a time

series are given. The second section gives the theoretical framework for pairs trading and in the

third section, the cointegration test proposed by Johansen (1991) is discussed.

2.1 Integration, cointegration and error correction

To begin the theory about cointegration, first the definitions of weakly stationarity and inte-

grated time series are given:

Definition. An n ordered sequence of random variables ,i.e., a time series or process {xt}is weakly stationary or second-order stationary if the first two moments of the distribution of

{xt} are constant and independent of time.

Definition. A time series which has a stationary, invertible, ARMA representation after di↵er-

encing d times, is said to be integrated of order d, denoted {xt} ⇠ I(d).

The two above definitions become tangible by an example of a simple VAR model. Consider a

k-dimensional VAR(p) time series {xt} with possible time trend so that the model is

xt = µt + �1xt�1 + ... + �pxt�p + at,

or

�(B)xt = µt + at,

3

CHAPTER 2. COINTEGRATION APPROACH 4

with

�(B) = [I � �1B � ...� �pBp],

where the innovation at is assumed to be Gaussian and µt = µ0 + µ1t, where µ0 and µ1 are

k-dimensional constant vectors. From the definition of weak stationarity, it follows that a nec-

essary condition for the VAR(p) system above to be weakly stationary is that all zeros of the

determinant |�(B)| lie outside the unit circle, {xt} is unit-root stationary or is said to be not

integrated (I(0) process) (Tsay, 2010). The definition of cointegration as stated in Engle and

Granger (1987) is given next.

Definition. The components of the vector {xt} are said to be cointegrated of order d, b,

denoted {xt} ⇠ CI(d, b), if (i) all components of {xt} are I(d); (ii) there exists a vector ⇧( 6= 0)

so that zt = ⇧0xt ⇠ I(d� b), b > 0. The vector ⇧ is called the cointegrating vector.

Considering the case where d = b = 1, cointegration would mean that the equilibrium error

would be I(0) and zt will rarely drift far from it’s mean and will often cross this line (Engle

and Granger, 1987). A convenient way of representing the vector {xt} as a stationary series is

by the error correction model (ECM) representation (solves the issue of overdi↵erencing (Tsay

(2010),p. 431)). The definition of the ECM is given next (Engle and Granger, 1987):

Definition: A vector time series {xt}, has an error correction representation if it can be ex-

pressed as:

A(B)(1 �B)xt = ��zt�1 + ut,

where ut is a stationary multivariate disturbance, with A(0) = I, A(1) has all elements finite

and � 6= 0.

In this representation of the ECM, only the disequilibrium in the last period is an explanatory

variable. However, by rearranging terms, any kind of set of lags can be written in this form.

Therefore, this representation of the ECM permits any type of gradual adjustment towards a

new equilibrium (Engle and Granger, 1987).

2.2 Theoretical framework for pairs trading

Define the observed price of stock i at time t as {Pit} and let pit = ln(Pit) be the correspond-

ing log price. Now a common assumption about {pit} is made in the literature (Tsay, 2010;

Vidyamurhty, 2004), namely the time series {pit} has a unit-root and follows a random walk:

pit = pi,t�1 + rit, where {rit} is the return (this unit root assumption of {pit} will be confirmed

by the Augmented Dicky-Fuller (ADF) unit-root test).


Based on the arbitrage pricing theorem (APT), if two stocks have similar risk factors, they

should have similar returns. If this is the case, {p1t} and {p2t} are likely to be driven by a

common component and are therefore cointegrated (Tsay, 2010). Or in formula, there exists a

linear combination wt = �0pt = p1t � �p2t, which is unit-root stationary and mean reverting.

These two price series {p1t} and {p2t} can also be written in an ECM form:

p1t � p1,t�1

p2t � p2,t�1

!=

↵1

↵2

!(wt�1 � µw) +

✏1t

✏2t

!, (2.1)

where µw = E[wt] denotes the mean of {wt}, which is referred to as the spread between the

two log stock prices.

The left hand side of the ECM form represents the log returns of both price series. Fur-

thermore, the equation states that the returns depend on the stationary series wt�1 � µw and

are therefore also stationary. Specifically, wt�1 � µw denote the deviations from the long-term

equilibrium between the two stocks. So, the returns of the stocks (left side of 2.1) depend on the

past deviations from the equilibrium. The coe�cients ↵1 and ↵2 respectively show the e↵ect of

these past deviations on the returns {r1t} and {r2t}. In practice, the coe�cients ↵1 and ↵2 will

have opposite signs, indicating the mean reversion behaviour of the stationary series.

2.3 Johansen cointegration test

For testing purpose, the ECM representation for a k-dimensional VAR(p) time series {xt} be-

comes:

�xt = µdt + ⇧xt�1 + �⇤1xt�1 + ... + �⇤

pxt�p + at,

where the deterministic regressor {dt} (constant/trend) is added and t = p + 1, ..., T . Further-

more,

�⇤j = �

pX

i=j+1

�i,

and

⇧ = ↵�0= �p + �p�1 + ... + �1 � I = ��(1).

The term ⇧xt�1 is referred to as the error correction term, which plays a key role in the cointe-

gration study (Tsay, 2010). If we assume that {xt} is at most I(1), �xt is I(0) process. Now,

one can consider three cases of interest of the ECM, namely:

1. Rank(⇧) = 0. Hence, ⇧ = 0 and xt is not cointegrated.

2. Rank(⇧) = k. Hence, |�(1)| 6= 0 and xt contains no unit roots and one can just look at xt

(which is I(0)).

3. 0 < Rank(⇧) = m < k. Hence, xt has m linearly independent cointegration vectors and k�m


unit roots. If one writes ⇧ = ↵�0, ↵ and � are k ⇥m matrices with Rank(↵) = Rank(�) = m.

As can be seen from the above three cases, the rank of the ⇧ matrix is su�cient for knowing

if the time series {xt} is cointegrated. Therefore, next a likelihood ratio (LR) test is described

for determining the rank of ⇧, which is called the Johansen cointegration test. The hypothesis

of this test can be formulated as H0 : Rank(⇧) = m versus Ha : Rank(⇧) < m. The value

of m starts at null and is sequentially added by one if the null hypothesis is rejected. If the

null hypothesis is rejected for every m k, {xt} has the properties of the second case specified

above.

The LR test statistic proposed by Johansen is defined as

LRtr(m) = �(T � p)kX

i=m+1

ln(1 � �i),

where �i (should be small for i > m) are the squared canonical correlations between ut and vt,

which are the residuals of �xt and xt�1. This test is also referred to as the trace cointegration

test. The asymptotic null distribution of this test is not �2, but Dickey-Fuller-type distribution,

which depends on k �m and the deterministic components (Tsay, 2010).

Chapter 3

Stochastic spread model

In this chapter I will describe a mean reverting Gaussian Markov chain model for the spread,

namely the stochastic spread model which is based on the paper by Elliot et al. (2005). Later in

this thesis the returns of this stochastic spread approach, when implemented as a pairs trading

strategy, are compared with the above mentioned cointegration approach using historical data.

3.1 The state-space model

At any given time, a pairs trading portfolio is associated with a quantity called the spread,

which is the di↵erence between the quoted prices of the securities used. If the spread of the

portfolio is significantly di↵erent from the mean, a position in both securities is taken with the

expectation that the spread will revert to its mean (Vidyamurhty, 2004).

To explicitly model the mean reverting behaviour of the spread, a state process {xk|k =

0, 1, 2, ...} is introduced, where {xk} denotes the value of some variable at time tk = k⌧ for

k = 0, 1, 2, .... We assume that {xk} is mean reverting:

xk+1 � xk = b⇣ab� xk

⌘⌧ + �

p⌧✏k+1, (3.1)

where � 0, b > 0, a 2 R and ✏ ⇠ N (0, 1). The above equation is a discretized Ornstein-

Uhlenhorst process: dX(t) = (a� bX(t))dt + �dW (t).

Furthermore, it is easy to see that xk ⇠ N (µk,�2k), with

µk = E(xk = a⌧+(1�b⌧)µk�1 = a⌧+(1�b⌧)[a⌧+(1�b⌧)µk�2] = ... =a

b�a

b(1 � b⌧)k+(1�b⌧)kµ0,

and

�2k = V ar(xk) = (1 � b⌧)2�2

k�1 + �2⌧ = ... = �2⌧1 � (1 � b⌧)2k

1 � (1 � b⌧)2+ (1 � b⌧)2k�2

0.

From these two equations the long term mean and variance can be derived.

For k ! 1:

µk =a

b, �2

k =�2⌧

1 � (1 � b⌧)2.

7

CHAPTER 3. STOCHASTIC SPREAD MODEL 8

The state equation can be rewritten in the following way:

xk = A + Bxk�1 + C✏k, (3.2)

where A = a⌧ , B = (1 � b⌧) and C = �p⌧ .

The latent variable {xk} defined above is used in the measurement equation, which defines

the observed spread {yk} as a mean reverting process with noise:

yk = xk + D!k, (3.3)

where D > 0 and ! ⇠ N (0, 1).

The model described above has three major advantages from an theoretical point of view.

The first one is rather obvious, namely the model is mean reverting. This is exactly what is

required of the spread between two stocks to implement a successful pairs trading strategy.

The second advantage is that the model for the spread is continuous in time, such that it is

convenient for forecasting purposes. Critical questions for pairs trading such as, the expected

holding period of the portfolio and the expected return of the strategy, can therefore be answered.

The third advantage is that the model is completely tractable. All the parameters can be

estimated using the Kalman filter and a maximum likelihood procedure called the EM algorithm.

In the next two sections, the Kalman filter and the EM algorithm will be discussed in detail.

3.2 Kalman Filter

To estimate the above dynamical system of the stochastic spread model, a very useful tool

called the Kalman Filter (which is named for the contribution of R.E. Kalman (Kalman, 1960))

is introduced. This Kalman Filter is an algorithm for calculating linear least squares forecasts

of the state vector on the basis of data observed through t,

xt+1|t = E[xt+1|�t],

where �t = (yt, yt�1, ..., y1, xt, xt�1, ..., x1). The Kalman filter calculates these forecast recur-

sively, generating x1|0, x2|1,..., xt|t�1 in succesion (Hamilton, 1994).

In this thesis, the Kalman filter is described as a four-step procedure and is based on the

description given in chapter 13 of the book of Hamilton (1994) and the paper of Elliot et al.

(2005). For convenience, the key features of a general state-space system are given first:

xt+1 = A + Bxt + C✏t+1, (3.4)

yt = xt + D!t, (3.5)

where ! and ✏ are both white noise processes.


For now it is assumed that the values of A,B,C and D are know, but later these parameters

are estimated with the use of the EM algorithm from Shumway and Sto↵er (1982).

To begin the Kalman filtering, the starting point of the recursion has to be set. Typically,

the starting point of the recursion is set as x1|0 = E[x1], which is just the unconditional mean

of x1. The associated Mean Squared Error (MSE) of this starting point is therefore P1|0 =

E[(x1 � x1|0)2].

After defining the starting point, the next step is to calculate the following points in time as

follows:

xk+1|k = E[xk+1|�k] = A + Bµk = A + Bxk|k, (3.6)

and the corresponding variance is:

Pk+1|k = E[(xk+1 � xk+1|k)2] = B2Pk|k + C2. (3.7)

The second step of the Kalman Filter is to forecast the observation of yk:

yk|k�1 = E[yk|xk, �t�1] = xk ⇡ xk|k�1. (3.8)

The MSE of yt is therefore equal to:

E[(yk+1 � yk+1|k)2] = Pk|k�1 + D2. (3.9)

Next the inference about the current value of {xt} is updated on the basis of the observation

of {yt} to produce

xk|k = E(xk|yk, �k�1) = E(xk|�k). (3.10)

Using the formula for updating a linear projection (Hamilton, 1994)(p.379) results in:

xk|k = xk|k�1 + (E[(xk � xk|k�1)(yk � yk|k�1)] ⇤ (E[(yk � yk|k�1)2])�1 ⇤ (yk � yk|k�1), (3.11)

xk+1|k+1 = xk+1|k + k+1 ⇤ (yk+1 � xk+1|k), (3.12)

where the stands for the kalman gain and is given by:

k+1 = Pk+1|k/(Pk+1|k + D2). (3.13)

The estimate xk+1|k+1 denotes the best forecast for of {xk+1} given �k.

3.3 The EM Algorithm

The Kalman filter assumes that the parameters in the state-space model are specified in advance.

Normally, this is not the case and these parameters have to be estimated. One widely used

estimation method is described in the paper of Shumway and Sto↵er (1982) and will also be


used in this thesis. In the paper of Shumway and Sto↵er (1982) the estimation of the parameters

is done by maximum likelihood using the EM algorithm. Next, I will discuss this estimation

method.

In order to estimate the parameters of the state space model defined by 3.4 and 3.5, the joint

log likelihood has to be specified for this model. The dependence on the unobserved time series

{xk} of the system, makes the specification of the likelihood function not straightforward. To

solve this problem, the EM algorithm is conditioned on the observed time series y1, ..., yn. Lets

define the estimated parameters at the (r + 1)st iterate as the values # = (A,B,C2, D2) which

maximize:

G(#) = Er[LogL|y1, ..., yn], (3.14)

where the conditional expectation Er refers to the rth iterative values of A(r), B(r), C2(r) and

D2(r). Furthermore, LogL is the joint log likelihood of the complete data. The conditional

mean and the covariance functions specified by the Kalman filter are conditioned on the full

dataset, which gives smoothed estimators of {xk}:

xk|n = E(xk|�n),

Pk|n = E[(xk � xk|n)2],

Pk,k�1|n = E[(xk � xk|n)(xk�1 � xk�1|n)].

The EM-algorithm is a two step iterative procedure that finds a stationary value # of the

likelihood function in the following way:

step 1 (The E-step): Compute (with # = #j):

Q(#, #) = E#[LogL|y1, ..., yn],

step 2 (the M-step): Find

#j+1 2 argmax Q(#, #).

The graph 3.3 shows a generated spread (with the parameters in Elliot et al. (2005)) and

the fitted values of this spread using the stochastic spread approach.


0 20 40 60 80 100 120−3

−2

−1

0

1

2

3

4

Days

Spre

ad

Figure 3.1: The fitted values of Stochastic Spread approach (green line) and simulated spread

(blue line)

Chapter 4

Trading design

This chapter discusses the trading strategy used in this thesis. In the first section, the trading

period is described. The second section sets out the pairs selection criteria for the two model

based approaches described in the former chapters. In the third section, the mean-variance

optimization theory of Markowitz (1952) for determining the optimal number of positions in the

spread, is discussed.

4.1 Trading period

The data used in this thesis contains daily closing prices of the stocks of the Amsterdam Stock

Exchange (AEX) in the period from 1st of January 2006 until 30th of December 2011 and is

obtained by Thomas Reuters through Datastream Advance. Since an equilibrium between two

stocks is not very likely to remain over the whole time of the dataset, the data is divided in little

blocks of formation periods and adjacent trading periods. The number of days in the formation

period are arbitrarily chosen and set to 128, 256 and 512 days. The adjacent trading period is

set to half of the trading days of the formation period as is done in earlier literature (Gatev et al.

(2006), Yakop (2011)). In the trading period, the number of positions in the spread is opened

following the mean-variance optimization procedure (discussed at the end of the chapter) and

the two standard deviation strategy. Any remaining open positions in the spread are closed at

the end of the trading period.

A rolling window of 40 trading days will be used to start a new formation period. The result

of implementing a rolling window is that after the first 128, 256 or 512 days (which are the

di↵erent lengths of the formation periods), all the remaining days in the dataset will be used

for trading and no opportunities are lost.

12

CHAPTER 4. TRADING DESIGN 13

4.2 Pairs selection

This section describes the criteria for selecting a suitable pair for the di↵erent methods.

4.2.1 Cointegration approach

As mentioned in chapter 2, {pit} is assumed to have a unit-root and follows a random walk

model: pit = pi,t�1 + rit. This assumption is tested with the ADF-test and if the null hypothesis

(a unit root) is not rejected, the series {pit} is selected.

After selecting the time series {pit}, all the di↵erent combinations of pairs are tested for

cointegration by the Johansen test procedure. The model specified for testing is:

�xt = ↵(�0wt�1 � µw) + c0 + �1�xt�1 + ✏t,

where µw is the intercept and c0 the deterministic trend.

Pairs that reject the first hypothesis of m = 0 and did not reject the second hypothesis of

m = 1 are selected as suitable pairs and have a mean reverting spread wt with mean mw. The

spread portfolio is wt = p1t � �p2t. So against one stock of {p1t}, � stocks of {p2t} are held,

where ↵ is the speed of adjustment parameter.

4.2.2 Stochastic spread approach

To select a pair suitable for trading, all the di↵erent combinations of spreads are estimated with

the EM algorithm and Kalman filter as discussed in chapter 3. After estimating the parameters

of the model, the parameter B of the state equation is evaluated. If B is between 0 < B < 1,

the spread shows mean reversing behaviour and the pair is selected for trading. The number of

positions taken in the spread is again obtained using the Mean-Variance optimization strategy

discussed below.

4.3 Mean-Variance optimization

This section will describe the mean-variance optimization procedure (MV), used for determining

the number of positions in a pairs trade. The concept of mean-variance optimization was first

introduced by Markowitz (1952). The main purpose of Markowitz’s paper was to mathematically

explain the behaviour of investors to diversify their portfolio. Markowitz claims that investors

do not only maximize the expected return of a portfolio, but also consider the variance of

the returns. In this thesis I will use Markowitzs ”expected returns-variance of returns rule to

optimize the number of positions held in a spread portfolio.

The ratio behind the optimization of the number of position in a spread portfolio lies in the

mean reverting behaviour of the spread of a pairs trade. No matter how big the deviation of


the mean, the spread is always expected to revert back to its long term equilibrium value. In

earlier literature about pairs trading, a fixed position in the portfolio is opened after the spread

hits a pre-set threshold some distance away (two standard deviations) from the long term mean

(Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). After hitting the threshold value, the

position is held until the spread reverts back to the mean. When this happens, the position is

unwound and a profit is made. In the time that has passed between opening and closing the

position, the spread could have been significantly larger than it was when the trader first opened

the position. If this is the case, the trader can generate a much bigger profit by taking on more

positions proportional to the size of the spread.

In this thesis, the opportunity to generate a higher profit in a trade is explored by varying the

number of positions. The positions taken in a spread are optimized by using a utility function

based on the aforementioned principle of the expected returns-variance of returns by Markowitz

(1952), namely:

Ut(wpt+1) = Et

wpt+1 � wpt

wpt

�� V art

wpt+1 � wpt

wpt

�,

where wpt is the amount of wealth of the portfolio at time t and � is a constant that mea-

sures the risk aversion of the trader (and is set to one when the strategy is evaluated). In

the paper of Markowitz (1952) it is stressed that finding reasonable values for Et

hwpt+1�wpt

wpt

i

and V art

hwpt+1�wpt

wpt

iby using reliable statistical techniques is essential. Both the stochastic

spread and the cointegration approach have these favourable characteristics. Now, let’s define

{returnt+1} as the value of a portfolio at time {t + 1} that invested one dollar in the spread

at time {t} . Using this definition for {returnt+1}, the expected return and variance can be

evaluated using the following equations:

Et

wpt+1 � wpt

wpt

�= zt ⇥ Et

returnt+1

wpt

�,

V art[rt+1] = z2t ⇥ V art

returnt+1

wpt

�,

where {zt} represents the number of positions taken in the spread portfolio. The value of

Et[returnt+1] is calculated with the use of the parameters estimated in the formation period.

The value of V art[returnt+1] is estimated in the formation period and is assumed to be constant

in the trading period.

The number of positions taken in the spread at any point in time can now be calculated by

maximizing the utility function with respect to {zt}. The first order condition is given by:

@Ut(zt)

@zt= E

returnt+1

wpt

�� 2�ztV ar

returnt+1

wpt

�= 0.

Since the second derivative of the utility function is always negative (� > 0, V ar[returnt+1] > 0),

solving this first order condition for {zt} gives the number of positions to be taken in the spread


that maximize the utility function. This optimal value of {zt} at any point in time is given by:

zt =E[returnt+1]

2�V ar[returnt+1]⇥ wpt.

The return rt+1 of this strategy is given by:

rt+1 =wpt+1 � wpt

wpt= zt ⇥

returnt+1

wpt.

When the optimal value of zt is used, the return of the strategy is as follows:

rt+1 =wpt+1 � wpt

wp�t=

Et[returnt+1]

2�V art[returnt+1]⇥ returnt+1.

It can be seen that the returns of this strategy are not dependent of the value of wpt.

Chapter 5

Evaluation

This chapter gives an evaluation of the results of the two model based approaches discussed in

chapters 2 and 3. The structure of this chapter is as follows. First, the definition of a Sharpe

ratio is given and a few concerns with the calculation of Sharpe ratio’s, as explained in the master

thesis of Yakop (2011), are discussed. In the second section, the results for both approaches are

given. The last section gives out of sample results of the di↵erent pairs trading strategies.

5.1 Sharpe ratio

A common way to compare the returns of di↵erent trading strategies is done by calculating

the ”reward-to-variability”, nowadays also called the Sharpe ratio introduced by Sharpe (1966).

The Sharpe ratio gives the excess expected return of an investment to its return volatility. In

formula,

SR =E[rt] � rf

�, (5.1)

where E[rt] and � are the expected return and standard deviation of the returns series {rt}. rf

is the average return earned by the benchmark in the evaluated period. The risk-free rate is

usually assumed to be an adequate benchmark for comparing the returns of the strategy. As

discussed in Yakop (2011), an adequate benchmark should act as an appropriate substitute for

pairs trading. Therefore, Yakop (2011) did not use the risk-free rate, but the composite index

of the stocks, in this case the AEX index. When calculating the Sharpe ratio with equation 5.1,

the rf is therefore set to zero. Afterwards, the calculated Sharpe ratio’s of the di↵erent trading

strategies are compared to the Sharpe ratio’s of the AEX index.

The estimation of the Sharpe ratio, SR, is found by substituting µ = 1T

PTt=1 rt for E[rt]

and � =q

1T

PTt=1(rt � µ) for �, which are the estimated mean and standard deviation of the

return series. As discussed in Yakop (2011), since SR is based on µ and � (which are estimated

with some error), SR is (also) estimated with some error. Denoting the vector (µ �)0by ✓ and

16

CHAPTER 5. EVALUATION 17

the SR formula in equation 5.1 by g(✓), Lo (2002) shows that the asymptotic distribution of the

SR estimator is given by:

pT (SR� SR) ⇠a N(0, VGMM ), VGMM =

@g

@✓⌃@g

@✓0 .

The estimation of @g@✓ and ⌃ and the derivation of the asymptotic distribution are not done in

this thesis. Interested readers are referred to Appendix A of Yakop (2011).

Furthermore, Yakop (2011) discusses two limitations of the use of Sharpe ratios. The first

limitation of the Sharpe ratio is that it implicitly assumes the return series to be normally

distributed or at least approximately so. In practice, pairs trading strategies produce frequent

small positive returns with sometimes large losses, which will accentuate the Sharpe ratios

because of the excess skewness and kurtosis (Lo, 2002).

The second limitation of the use of Sharpe ratio is that it ignores any underlying serial

correlation, which is frequently present in financial time series. The consequence of the serial

correlation is, again, that it results in overestimation of the Sharpe ratio’s (Lo, 2002) . To resolve

this issue, the standard deviations of the return series, � have to be estimated by the Newey-West

(1987) (heteroskedastic) autocorrelation consistent estimator of variance (HAC estimator). The

HAC estimator is used when calculating the Sharpe ratio’s of the return series. The derivation of

the HAC estimator is not done in this thesis, but can also be found in Yakop (2011) in Appendix

A.

5.2 Results

In this section the results of pairs trading with the Stochastic Spread approach and the Cointe-

gration approach are given.

5.2.1 Stochastic Spread Approach

As mentioned in the third chapter, the Stochastic Spread model has three major advantages

from a theoretical point of view. The model captures mean-reversion, is continuous in time and

is completely tractable. Despite these hopeful properties of the model, the experienced empirical

results turn out to be less favourable.

First of all it takes a long time to estimate the parameters of one spread, let alone those of

the 276 di↵erent spreads available in the AEX (consisting of 24 stocks). To give an indication

of the time needed to estimate these spreads: a single formation period already takes forty-two

minutes. There are seven formation periods in this dataset. So the estimation of all the di↵erent

pairs in the dataset would take roughly five hours.

This first disadvantage stated above, is inconvenient but can be overcome by the use of

faster computers (or patience). However, another disadvantage is more problematic. After the


estimation of all the di↵erent spreads, the amount of pairs found suitable for pairs trading was

minimal. For example, the first formation period resulted in five suitable pairs. This is not

much, given the fact that there are 276 di↵erent pairs available.

Also, the parameters estimated from the pairs selected by this method, suggest that the

model can be simplified to a simple AR(1) model for the spread. Specifically, the parameter D

in the space equations is estimated to be at most 0.001. This suggest that the state-space model

can be brought back to the state equation, which is just a simple AR(1) model for the spread.

This AR(1) model has already extensively been tested in the context of pairs trading inYakop

(2011) and will therefore not be further analysed in this thesis.

So, despite the favourable theoretical properties, the use of the stochastic spread model for

pairs trading, which was suggested by Elliot et al. (2005), does not turn out to be a good

approach for pairs trading in practice.

Parameters of selected Pairs Values

Number of possible pairs 276

Average number of selected pairs 5

A 0.0062

B 0.9845

C 0.2660

D 0.0007

Table 5.1: Estimation results of Stochastic Spread Approach

0 100 200 300 400 500 600−4

−3

−2

−1

0

1

2

3

4

5

6

(a) Fitted values of spread in FP, with B = 0.9827

0 50 100 150 200 250 300−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

(b) Actual spread in the trading period

Figure 5.1: Example of a Pair selected with the Stochastic Spread Approach


5.2.2 Cointegration Approach

Contrary to the stochastic spread approach, the results of the cointegration approach are useful

for evaluating a pairs trading strategy. To begin the evaluation of the cointegration approach, an

overview of the specifics of the dataset and parameters used in the analyses are stated in table

5.2. As can be learned from table 5.2, results for three di↵erent lengths of formation periods

(respectively 128, 256 and 512 days) and the adjacent trading periods, are estimated. In these

di↵erent lengths, all the possible combinations of pairs (in this case 276 pairs) are tested with

the Johansen cointegration trace test described in 2.3 (with a significance level of 0.05). The

average amount of pairs found by this test for the di↵erent formation periods are also stated in

table 5.2.

Parameters Description Values

D Number of trading days 1316

S Number of stocks 23

RW Rolling window 40

FP Formation period 128 days 256 days 512 days

TP Trading period 64 days 128 days 256 days

NT Number of trading periods 28 23 13

NP Average number of Pairs 19 28 35

Table 5.2: Parameters for trading strategy

The graphs of figure 5.2 on page 20 show the behaviour of two di↵erent pairs during the

formation and trading period. As can be seen from the graphs, the pair of stocks show periods

of divergence and convergence during the formation period. This mean reversion behaviour is

the key for a profitable pairs trading strategy and is present in all pairs selected in the formation

period. Unfortunately, some of the pairs formed during the formation period will not portray

the same behaviour during the trading period (see graph d). As a result, losses will be made

on these pairs. For the pairs trading strategy to be a success, the pairs that do show mean

reversion behaviour should make up for the probable losses incurred on these ”bad” pairs.

Now I will present the main results of the cointegration approach. Table 5.3 contains the

calculated Sharpe ratios of the cointegration approach using the mean-variance optimization

trading strategy. The Sharpe ratios are calculated on the basis of the daily returns and therefore

look small. Conversion of these daily SRs to annual SRs is commonly done by multiplying the

SRs byp

250. This is known as ’time aggregation’ within finance. Lo (2002) however shows that

statistically speaking this rule is incorrect because of the serial correlation underlying financial


0 20 40 60 80 100 120 140−6

−5

−4

−3

−2

−1

0

1

2

Days

Spread

(a) Spread FP: Aegon, Heineken

0 10 20 30 40 50 60 70−1

0

1

2

3

4

5

Days

Spread

(b) Spread TP: Aegon, Heineken

0 20 40 60 80 100 120 140−3

−2

−1

0

1

2

3

Days

Spread

(c) Spread FP: PostNL, Unibail-Rodamco

0 10 20 30 40 50 60 70−10

−5

0

5

10

15

Days

Spread

(d) Spread TP: PostNL, Unibail-Rodamco

Figure 5.2: Example of Pairs


returns, which can result in extreme overestimation of the SRs. Therefore, only the estimated

daily SRs are included in this thesis.

Furthermore, it has to be noted that the calculation of the daily returns did not incorpo-

rate the transaction costs. Including transaction costs in the investigation would require some

creativity, since the di↵erence between the bid and ask price of a stock is not reported (only

the daily closing prices are). The fee for making a transaction is also not commonly known.

Therefore, the inclusion of transaction costs within pairs trading justifies an entire research on

its own and shall not be further dealt with in this thesis.

As can be seen from the average Sharpe ratios of this strategy, the mean-variance optimiza-

tion su↵ers large losses in all the di↵erent formation periods length. This is a remarkable result,

since this strategy is supposed to maximize the value of the portfolio. Unfortunately, one critical

assumption of this strategy is that the selected pairs have the property of mean reversion. If this

assumption is not met and a pair drifts away, the number of positions taken in the spread will

increase dramatically and huge losses will be taken. The results show that there are to many

pairs that show this behaviour. Therefore the average Sharpe ratios of the di↵erent formation

periods are negative.

Benchmark Descriptive statistics SRs Count

SR(AEX) FP Average Max Min Std. Dev. pos. SR SR >

SR(AEX)

Significant

at 5%

0.0073 128 -0.0447 0.1461 -0.1523 0.0583 13 5 5

0.0046 256 -0.0444 0.0585 -0.1079 0.0436 8 3 3

0.0439 512 -0.0376 0.0041 -0.0665 0.0227 2 1 0

Table 5.3: SR’s for di↵erent FP with MV optimized positions

On the other hand, the histogram in figure 5.3 (which shows the distribution of the SR in

the di↵erent TP) shows us that if there are enough pairs that do mean reverse in one formation

period, the SR of that period can be high (SR of 0,15). Unfortunately, this does not happen

often enough and the overall results of this strategy are disappointing.


−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.20

1

2

3

4

SRs

Freqeuncy

Figure 5.3: Histogram of the estimated SR’s of the MV strategy of formation period length of

128 days

To compare the mean-variance strategy with a less risky strategy, I also calculated the Sharpe

ratios using the common two standard deviation (2STD) strategy for opening a position. This

strategy is not as risky as the mean-variance optimization, because it will only open one position

at a time. The results of this strategy are stated in table 5.2.2. It can be seen that the 2STD

strategy returns positive average Sharpe ratios in the three di↵erent formation periods, where

the formation period of 128 days has the highest average. In contrast to the mean-variance

strategy, the pairs that do not converge and will drift away from the equilibrium will only have

a loss of two times the standard deviation. These losses are clearly overcome by all the pairs that

do behave as expected, which results in the positive average Sharpe ratios for all the di↵erent

trading periods.



SR(AEX) FP Average Max Min Std. Dev. pos. SR SR >

SR(AEX)

Significant

at 5%

0.0073 128 0.0209 0.0667 -0.0227 0.0233 25 15 11

0.0046 256 0.0147 0.0469 -0.0167 0.0221 19 12 9

0.0439 512 0.0081 0.0204 -0.0045 0.0092 10 2 2

Table 5.4: SR results of di↵erent FP with 2STD trigger opening of a position

5.3 Results using DAX index

To see if the results of the cointegration approach are robust, an second estimation of the

cointegration approach for both trading strategies is done. The second dataset consists of the

daily closing prices from the last five years of the DAX index (which includes the thirty biggest

listed German companies). The results of both trading strategies are given in the table 5.5

below.

As can be seen in table 5.5, the MV strategy is performing even worse in this dataset than it

did in the AEX dataset. The average daily SRs of the MV strategy for the di↵erent periods are

all negative and only in one TP does the MV strategy significantly outperform the DAX index

(FP:128 days). The 2STD strategy (again) performs better than the MV strategy and generates

small positive average SR in all the trading periods. The results of both pairs trading strategies

of both datasets are much alike. Therefore, it can be concluded that the results obtained are

robust.


Strategy SR(DAX) FP Average Max Min Std. Dev. pos. SR SR >

SR(AEX)

Significant

at 5%

MV

0.0551 128 -0.0843 -0.0119 -0.1829 0.0486 0 3 1

0.0596 256 -0.0636 0.0165 -0.1056 0.0426 1 3 0

0.0316 512 -0.0429 -0.0225 -0.0536 0.0105 0 1 0

2STD

0.0551 128 0.0210 0.0689 -0.0181 0.0215 17 8 5

0.0596 256 0.0135 0.0429 -0.0069 0.0159 14 3 3

0.0316 512 0.0069 0.0148 -0.0045 0.0030 6 3 3

Table 5.5: SR Results of MV and 2STD for Dax index

Chapter 6

Conclusion

In this thesis two di↵erent model based approaches for pairs trading were discussed and tested

with the use of two di↵erent trading strategies. Results were generated for the daily closing

prices of the stocks in the AEX index over the last five years. Furthermore, an out of sample

estimation was done to verify if the results where robust.

The first approach for modelling the behaviour of a pair, the stochastic spread, was first

suggested (but not yet tested) by (Elliot et al., 2005). From a theoretical point of view, the

stochastic spread has three major advantages. The model captures mean-reversion, is continuous

in time and is completely tractable. Despite these theoretical advantages, the empirical results

turn out to be less favourable in practice. First of all, the stochastic spread approach did not

find pairs suitable for trading. Secondly, the estimated parameters of the state-space form of

the model suggested that the model could be simplified to only the state equation (which is just

an AR(1) model). This renders the estimation of the parameters with the EM-algorithm and

Kalman filter unnecessary, since the AR(1) model is embedded in the other approach discussed

in this thesis. Therefore, only a few estimates and graphs of the spread are present and not the

actual results of pairs trading are present in this thesis.

The second approach for modelling the behaviour of a pair is the cointegration approach.

The idea of cointegration was already used for pairs trading in earlier papers (Yakop (2011),

Vidyamurhty (2004)). The approach in these earlier papers however, is more ad-hoc and not

based on the error correction model (ECM), which is normally used in econometric research. In

this thesis the cointegration approach is based on the ECM and the pairs are tested with the

use of the Johansen cointegration test.

Subsequently, two trading strategies for taking a position in the spread were used to calculate

the results. The first one is the two standard deviations strategy (2STD). This strategy is

commonly used in the literature (Yakop, 2011; Vidyamurhty, 2004, Gatev et al., 2006). The

concept of this strategy is very simple: one takes a position in the spread if it is far enough

(two standard deviation) away from the mean and closes the position when the spread returns

24

CHAPTER 6. CONCLUSION 25

to the equilibrium value. The second strategy is called the mean-variance approach (MV). As

the name suggests, the number of positions taken in the spread is determined by a trade-o↵

between the di↵erence from the spread of the mean and the variance of the spread. The spread

is expected to revert back to the mean and the MV strategy uses this assumption to maximize

the portfolio value by varying the number of positions taken in the spread.

The results of both strategies are in tables 5.2.2 and 5.3. The 2STD strategy generated

small positive returns over all the di↵erent formation periods. This result is typical for a pairs

trading strategy and is thus what you would expect. In contrary, the MV strategy generates

large negative SRs in all the formation periods. This is not what you would expect, because

this strategy aims to maximize the portfolio value by varying the number of positions in the

spread and should, consequently, perform well. However, one crucial assumption for the success

of this strategy, namely mean reversion, is not met by a large number of pairs. The number

of positions drastically increases in these pairs and the losses are substantial. This leads me to

the conclusion that the MV strategy might be too risky (in this case, at least) for pairs trading.

The estimation of the second dataset (DAX index) confirms this, because similar results were

generated. Given the fact that two indices produced similar results, one can conclude that these

results are robust.

Further research in pairs trading should focus on other ways to optimize the trading strategy,

since the MV procedure did not generate the desired results. Furthermore, the inclusion of

transaction costs within pairs trading is a relevant topic that should be taken into account,

but has not yet been investigated. One could also investigate the concept of pairs trading for

more than two securities, such as ”triple” or ”quadruple trading”. The cointegration approach

discussed in this thesis could be a good way for investigating this topic, since the existence of a

cointegration relation between three or four stock can be easily tested within this framework.

Bibliography

Baronyan, S. R., Boduroglu, I. I., and Sener, E. (2010). Investigation of Stochastic Pairs Trading

Strategies under di↵erent Volatility Regimes. The Manchester School, pages 114–134.

Broda, S. (2011). Financial econometrics slides.

Do, B., Fa↵, R., and Hamza, K. (2006). A New Approach to Modeling and Estimation for Pairs

Trading. Working Paper, pages 1–30.

Elliot, M. J., van der Hoek, J., and Malcolm, W. (2005). Pairs Trading. Quantitative Finance,

5(3):271–276.

Engle, R. F. and Granger, C. W. (1987). Co-integration and Error Correction:representation,

Estimation and Testing. Econometrica, 55(2):251–276.

Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. (2006). Pairs Trading: Performance of

a Relative-Value Arbitrige Rule. Review of Financial studies, 19(3):797–827.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.

Johansen, S. (1991). Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian

Vector Autoregressive Models. Econometrica, 59(6):1551–1580.

Kalman, R. (1960). A new Approach to Linear Filtering and Prediction Problems. Journal of

Basic Engineering, 82:35–45.

Lo, A. (2002). The statistics of Sharpe Ratios. Financial Analysts Journal, July/August:36–52.

Markowitz, H. (1952). Portfolio Selection. Journal of finance, 7(1):77–91.

Sharpe, W. (1966). Mutual Fund Performance. The journal of Business, 39(1):119–138.

Shumway, R. and Sto↵er, D. (1982). An Approach to Time Series Smoothing and Forecasting

using the EM Algorithm. Journal of Time Series Analysis, 3:253–264.

Tsay, R. S. (2010). Analysis of Financial Time Series. John Wiley and Sons, Inc., third edition

edition.

26

BIBLIOGRAPHY 27

Vidyamurhty, G. (2004). Pairs Trading, Quantitative Methods and Analysis. John Wiley and

Sons, Inc.

Yakop, M. (2011). A Comparative Analysis of Pairs Trading. Master’s thesis, University of

Amsterdam.

Pairs Trading: An implementation of the Stochastic Spread and Cointegration Approach

Documents

Transcript of Pairs Trading: An implementation of the Stochastic Spread and Cointegration Approach