ASY (Asymmetry Parameter) or asymmetry factor. Aerosol forcing (natural + anthropogenic)
Leverage, asymmetry and heavy tails in the high ... › 2017 › 11 › paper1.pdfLeverage,...
Transcript of Leverage, asymmetry and heavy tails in the high ... › 2017 › 11 › paper1.pdfLeverage,...
Leverage, asymmetry and heavy tails in the high-dimensional
factor stochastic volatility model∗
Mengheng Li†
Department of Econometrics, VU University Amsterdam, The Netherlands
Marcel Scharth‡
Discipline of Business Analytics, University of Sydney Business School, Australia
Working paper; this version: November 12, 2017
Abstract
There is a rich empirical literature that studies the stochastic volatility (SV) of univari-ate financial time series whose distribution exhibits asymmetry and heavy tails. Yet theliterature focusing on high-dimensional SV models appears to be much scarcer, lacking ageneral modelling framework and efficient estimation method due to “curse of dimensional-ity”. Our contribution is twofold. Firstly, we propose a flexible high-dimensional factor SVmodel with leverage effect, asymmetry and heavy tails based on errors following the gener-alised hyperbolic skew Student’s t-distribution. With shrinkage, the model leads to differentparsimonious forms, and thus is able to disengage systematic leverage effect and skewnessfrom asset-specific ones. Secondly, we develop a highly efficient Markov chain Monte Carloestimation procedure that analyses the univariate version of the model using efficient im-portance sampling. Extension to higher dimensions is straightforward via marginalisationof factors. Computational complexity is shown to be linearly scalable in number of bothfactors and assets. We assess the performance of our proposed method via extensive simu-lation studies using both univariate and multivariate simulated datasets. Finally, we showthat the model outperforms other factor models in terms of estimation of value-at-risk andminimum-variance portfolio performance for a U.S. and an Australian portfolio.
Keywords: Markov chain Monte Carlo; Generalised hyperbolic skew Student’s t-distribution;Stochastic volatility; Metropolis-Hastings algorithm; Importance sampling; Particle filter; Parti-cle Gibbs; State space model; Time-varying covariance matrix; Factor model
JEL Classification: C11; C32; C53; C55; G32
∗We would like to thank George Tauchen, Richard Gerlach, Gary Koop, Siem Jan Koopman, Frank Kleibergen,Lennart Hoogerheide, Robert Kohn, Charles Bos, Anne Opschoor, and seminar and workshop participants at TheUniversity of Sydney Business School, VU University Amsterdam, University of Amsterdam, Tinbergen Institute,the 10th International Conference on Computational and Financial Econometrics (Seville, 2016), the 10th Societyof Financial Econometrics Annual Conference (New York, 2017), the 1st International Conference on Econometricsand Statistics (Hong Kong, 2017), the 8th European Seminar on Bayesian Econometrics (Maastricht, 2017) foruseful comments and helpful suggestions on previous versions of this paper. Any remaining errors are ours alone.†Email: [email protected]; Contact author‡Email: [email protected]
1
1 Introduction
Time-varying volatility and leverage effects, two of the so-called “stylised facts”, are often the
focus of research on time series of financial returns that are also believed to be asymmetrically
distributed with heavy tails. The rich literature studying financial times series provides strong
econometric evidence supporting such empirical findings. There are two major classes of models.
One is parameter-driven stochastic volatility (SV) models and the other is observation-driven
(generalised) autoregressive conditional heteroskedasticity (GARCH) models. Kim et al. (1998)
provides a classical comparison between the two classes of models in terms of filtering estimation
and forecasting performance. They find that the Gaussian SV model fits empirical data similarly
as the GARCH model with Student’s t-error. Carrasco and Chen (2002) derive detailed sta-
tistical properties of these two classes of models including mixing property and (un)conditional
distributions characterized by finite moments. The development of research in the last decade
shifts from statistical analysis to more detailed modelling techniques which aims at capturing
“stylised facts” including not only time-varying volatility but also leverage effects, left skewness
and heavy-tailedness of financial series. To this end, both new classes of SV and GARCH models
have been developed. Among many others, Shephard and Pitt (1997) and Durbin and Koop-
man (1997) develop similar simulated likelihood estimation procedure to estimate SV model with
Student’s t-error. The observation-driven counter part is the GARCH-t model firstly developed
by Bollerslev (1987) three decades ago. Leverage effect corresponds to the negative correla-
tion between past returns and future volatility. GARCH-M model of French et al. (1987) and
EGARCH model of Nelson (1991) extend the conditional structure of time-varying variance to
model the negative correlation. Koopman and Hol Uspensky (2002) and Yu (2005) discuss ways
of modelling leverage effect in SV models where the former also provides an efficient simulated
likelihood estimation method and the latter additionally shows that leverage effect may be the
cause of skewness of the return distribution.
A recent SV model proposed by Nakajima and Omori (2012) provides a modelling framing
work which nests time-varying volatility with leverage effect and heavy-tailed error distribution
with skewness based on a Gaussian mixture representation of Aas and Haff (2006)’s generalised
hyperbolic skew Student’s t-distribution. Our paper builds on the previous two researches by
which we are inspired to propose a new estimation procedure that can deliver more efficient
inference. The estimation of time-varying volatility models is straightforward if the model
is observation-driven, like all variants of GARCH models. It becomes more difficult if the
2
model is parameter-driven, i.e. SV models, which usually boils down to some non-linear state
space models without analytical likelihood function. It is recognised that simulated likelihood is
applicable to simple SV models, and that it suffers from flat likelihood function, multimodality
and other numerical issues when the model becomes more complex. In such a case, Bayesian
approach provides a sound alternative and is widely used because it has standard procedure
for sampling and is easy to conduct inference. Several ways of sampling the latent SV process
from its posterior distribution have been proposed, among which the multi-move sampler of
Shephard and Pitt (1997) and Watanabe and Omori (2004) and the auxiliary particle filter of
Pitt and Shephard (999a) are the most widely used methods. For general discussion on Bayesian
estimation of SV models we refer to Jacquier et al. (2004) and the references therein. These
methods fall within the broader category of sequential Markov Chain Monte Carlo method
detailed in Doucet et al. (2001). Another sampling method which this paper partially builds
upon is the efficient importance sampling (EIS) originally developed by Richard and Zhang
(2007). EIS is based on a carefully-constructed globally optimal importance density instead
of a locally optimal proposal which is used by the multi-move sampler and auxiliary particle
filter. Scharth and Kohn (2016) develop a highly efficient and stable algorithm called particle
efficient importance sampling (PEIS). As the name suggests, PEIS evaluates an intractable but
unbiasedly estimable likelihood function via combination of EIS and the sequential particle filter.
This paper refines PEIS in the context of a modified Gibb’s sampler (Lindsten et al., 2014)
and applies it to model high-dimensional SV models, a field of research where literature appears
to be much scarcer than univariate SV models. To the best of our knowledge, multivariate mod-
els with time-varying volatility are often difficult to estimate due to “curse of dimensionality”,
namely the number of parameters grows exponentially with the dimension of assets. Corner-
stones of multivariate observation-driven time-varying volatility models include but not limited
to the constant conditional correlation (CCC) GARCH model of Bollerslev (1990) which models
time-varying covariance matrix with constant correlation among assets. Engle (2002) extends
CCC-GARCH model with dynamic conditional correlation (DCC) and shows its applicability
in terms of estimation and forecasting. A GARCH model with dynamic conditional structure
for the vectorized covariance matrix (VGARCH) is studied by Bollerslev et al. (1994). All these
type of models are widely available in different commercial packages, but the dimension con-
sidered barely exceeds 20 except for the VGARCH model. Low-dimensional models apparently
cannot be of much help to quantitative mutual fund or hedged quant fund (Dempster et al.,
2008) which continues to gain popularity in the recent years thanks to advancement in compu-
3
tational power. A report by Vardi (2015) finds that quant funds usually have tens and even
hundreds of positions in their portfolio, highlighting the need of a high-dimensional multivariate
model for risk and investment management. An attempt to achieve this by observation-driven
models comes from a new class of generalised autoregressive score (GAS) models developed by
Creal et al. (2012) and Oh and Patton (2017). Promising results and successful applications in
high-dimensional models have been documented.
In the parameter-driven world, univariate SV models can be easily extended to multivari-
ate ones in a straightforward manner, however the difficult estimation usually hampers their
practical use (Chib et al., 2009). In such a case, Bayesian estimation is typically employed.
For example, in low-dimensional models Danielsson (1998) and Asai et al. (2006) thoroughly
survey developments in sampling the latent volatility process with comparisons among different
model specifications. Liesenfeld and Richard (2006) apply EIS to a portfolio with four assets,
leaving high-dimensional applications to future research. As far as we know, Pitt and Shephard
(999b) and Chib et al. (2006) are among the earliest who manage to model high-dimensional
financial time series with distinctive SV series pertaining to every individual equity return, and
they propose to model correlation via latent dynamic factors which also serve as systematic
measure of market movements. Nakajima (2015) extends the univariate model of Nakajima and
Omori (2012) to a factor-free high-dimensional framework, which addresses the leverage effect,
skewness and heavy tails of individual asset’s error distribution.
To resolve the dimensionality issue, this paper proposes a flexible high-dimensional factor
SV model. We address leverage effect and model asymmetry and heavy tails based on gen-
eralised hyperbolic skew Student’s t-error, which complements existing study and discussion.
Importantly, we introduce shrinkage to the model, resulting in automated model selection. The
resulted parsimonious form is expected to disentangle leverage effect and asymmetry in idiosyn-
cratic noise from those in the factors. A highly efficient Markov chain Monte Carlo estimation
procedure which uses EIS to exploit the Gaussian mixture representation of the error distribu-
tion is proposed to analyze the univariate version of the model. Sampling scheme of the full
model is simplified via marginalisation of factors and boils down to estimation of many uni-
variate series which can be done in parallel. As a result, the high-dimensional model is able to
achieve efficiency comparable to a univariate model. We assess the performance of our proposed
method via simulation studies with both univariate and multivariate simulated data. Finally
the model is applied to two portfolios consisting of equity returns from S&P100 and ASX50.
Comparisons among other factor models are carried out in terms of estimation of value-at-risk
4
(VaR) and minimum-variance portfolio performance.
Our discussion is organized as follows. Section 2 introduces the model setting and our
proposed Bayesian estimation method including the use of EIS in the context of particle Gibbs
with ancestor sampling. Section 3 details the methods of evaluating marginal likelihood based on
an efficient particle filtering algorithm combined with importance sampling of hyperparameters.
Section 4 starts with a simulation study on the univariate model in comparison with the method
in Nakajima and Omori (2012). And a simulation study on high-dimensional factor model is
carried out to assess the estimation efficiency and performance of the marginal likelihood and
Bayes factor criterion in choosing the right number of factors. Section 5 illustrates our empirical
application in VaR and dynamic portfolio management. We conclude in Section 6.
2 Model and Bayesian estimation
2.1 Univariate stochastic volatility model
Nakajima and Omori (2012) introduce the following univariate stochastic volatility model with
leverage using generalised hyperbolic skew Student’s t-error
yt = νt exp(ht/2), t = 1, ..., T,
νt = α+ βWt +√Wtεt, t = 1, ..., T,
ht+1 = µ(1− φ) + φht + ηt, t = 1, ..., T − 1,εtηt
∼ N(0
0
, 1 ρσ
ρσ σ2
), t = 1, ..., T,
Wt ∼ IG(ζ
2,ζ
2), t = 1, ..., T,
(1)
where yt is the time series of equity returns, and ht is the unobserved log-volatility modelled as
a stationary AR(1) process with initialisation h1 ∼ N(µ, σ2
1−φ2 ), and νt follows the generalised
hyperbolic skew Student’s t-distribution. ρ models the leverage effects often found to be negative
in financial returns (Yu, 2005)1, which indicates that a drop in equity return likely leads to an
increase in its volatility. IG denotes the inverse Gamma distribution, and the mixing random
variable Wt is introduced to jointly model asymmetry and heavy tails in yt. We choose α =
−βζ/(ζ−2) so that E(νt) = 0 and restrict ζ > 4 to ensure νt has a finite variance. The skewness
1We adapt the definition of leverage effect in Yu (2005), i.e. the correlation between the idiosyncratic error νtand the SV innovation ηt. ρ itself is thus not the leverage effect.
5
and heavy-tailedness of νt are jointly determined by the asymmetric parameter β and degrees of
freedom ζ. Figure 1 shows different shapes of νt’s density with various β and ζ values. Readers
2
1
0
-
f(8 ; -2<-<2, 1=10)
-1
-2-6
-4-2
0
8
24
6
0
0.25
0.15
0.3
0.1
0.35
0.05
0.2
20
15
1
10
f(8 ; -=-2, 5<1<20)
5-6
-4-2
0
8
24
6
0.05
0
0.3
0.25
0.2
0.15
0.1
Figure 1: Different density shapes of generalised hyperbolic skew Student’s t-distribution. Left:
varying β with ζ = 10; right: varying ζ with β = −2.
can refer to Aas and Haff (2006) for a detailed account of generalised hyperbolic skew Student’s
t-distribution including its density function fν , the p-th moment E(|ν|p), and an EM algorithm
for parameter estimation. β = 0 corresponds to a symmetric Student’s t-distribution for νt and
a standard normal distribution if ζ further becomes large. As argued by Aas and Haff (2006), a
unique feature of the model for νt is that in the tails
fν(ν) ∝ |ν|−ζ/2−1exp(−|βν|+ βν) as ν → ±∞.
This means that fν has one heavy and one semiheavy tail, unlike many other forms of skew
Student’s t-distribution whose both tails decay polynomially, making it an appealing model for
financial data.
Assuming |φ| < 1 and E(|νt|p) exists, the unconditional p-th moment of yt in model (1) is
E(|yt|p) = exp( σ2p2
2(1− φ2)+ µp
)E(|νt|p).
Notice that model (1) implies conditional time-varying leverage effects. Given Wt, one has
Cov(νt, ηt) =√Wtρσ.
This means that if one interpretsWt as a “shock variable”, such a shock has a multiplicative effect
on the leverage. In the Appendix we show that unconditionally the leverage effect Corr(νt, ηt) =
6
Le(β, ζ)ρ has the following multiplier
Le(β, ζ) =Γ( ζ−1
2 )
Γ( ζ2)
√(ζ − 2)2(ζ − 4)
2ζ2 + (4β2 − 12)ζ + 16, ζ > 4. (2)
Basic algebra shows Le(β, ζ) ∈ (0, 1), ∀β, ζ ∈ R with ∂Le∂ζ > 0, ∂2Le
∂ζ2< 0, ∂Le
∂|β| < 0, and ∂2Le∂β2 < 0.
Given β, when ζ becomes large the density of νt is less skewed and has lighter tails (?), so
Le(β, ζ) tends to one or leverage effect tends to ρ, similar to the case of a standard SV model
with normal error. Given ζ > 4, the magnitude of leverage decreases to zero with |β| even
though ρ 6= 0. This feature tells that if the return innovation νt puts a large weight on the
“shock variable” Wt (i.e. large |β|), leverage effect vanishes.
We develop an MCMC algorithm which partially builds on Nakajima and Omori (2012) who
argue that the Gaussian variance-mean mixture representation of νt as the second line in model
(1) allows for a conditional sampler, but ours is believed to be more efficient and computationally
faster. The density function of inverse Gamma distribution and normal distribution are log-
linear, indicating possibility of building globally optimal importance density for Wt and ht via
the EIS method by Richard and Zhang (2007). In Nakajima and Omori (2012), a modified multi-
move sampler from Shephard and Pitt (1997) is used to sample ht block-by-block conditional on
Wt with a local Laplace approximation to the posterior density (see also Watanabe and Omori
(2004) and Takahashi et al. (2009)). Later we show that efficiency is further improved with a
novel particle Gibbs algorithm based on the EIS importance density which samples ht and Wt
as a whole block. The next section details our MCMC algorithm.
2.2 Estimation of the univariate model
Let θ = (σ, ρ, φ, µ, β, ζ) collect hyperparameters, and xt1:t2 denote the history of process xs from
s = t1 to t2. The MCMC algorithm developed below boils down to a Metropolis-within-Gibbs
procedure (e.g., Gilks et al. 1995, Geweke and Tanizaki 2001 which Koop et al. 2007) samples
from the posterior distribution of (θ, h1:T ,W1:T )|y1:T for model (1). The algorithm iterates over
1. sampling (h1:t,W1:t)|y1:t, θ;
2. sampling θ|y1:t, h1:t,W1:t.
7
2.2.1 Sampling (h1:t,W1:t)|y1:t, θ
We aim at improve efficiency by sampling the latent processes ht and Wt as one block. For no-
tational simplicity, the dependence on θ is suppressed. p(·) generically denotes density function,
possibly with subscript indicating a specific distribution.
Model (1) is a non-linear non-Gaussian state space model, and reformulating the model to
tackle the leverage effect gives
yt = (α+ βWt +√Wtεt)e
ht/2, t = 1, ..., T
ht+1 = µ(1− φ) + φht + ρσεt +√
1− ρ2ση∗t , t = 1, ..., T − 1
where εt = (yte−ht/2−α− βWt)/
√Wt, and η∗t is standard normal independent on εt. We notice
that εt ∈ Ft, where Ft should be the filtration generated by both observables y1:t and unob-
servables h1:t and W1:t such that the model is Markovian and yt forms a martingale difference
sequence, allowing factorisation of likelihood via likelihood contributions.
Introducing xt = (ht,Wt)′, the likelihood is given by the integral
L(y1:T ) =
∫p(y1:T , x1:T )dx1:T =
∫p(y1|x1)p(x1)
T∏t=2
p(yt|xt)p(xt|xt−1, yt−1)dx1:T , (3)
where the transition density for t = 2, ..., T follows
p(xt|xt−1, yt−1) = pN (ht|ht−1, yt−1,Wt−1)pIG(Wt)
= N(ht;µ(1− φ) + φht−1 + ρσεt−1, (1− ρ2)σ2
)· IG
(Wt;
ζ
2,ζ
2
).
(4)
The efficient high-dimensional importance sampling method (EIS) of Richard and Zhang (2007)
and further studied by e.g., Jung and Liesenfeld (2001) and Scharth and Kohn (2016) proposes
the following importance sampler
q(x1:T |y1:T ) = q(x1|y1:T )T∏t=2
q(xt|xt−1, y1:T ),
with the conditional density q(xt|xt−1, y1:T ) for t = 2, ..., T written as
q(xt|xt−1, y1:T ) =kq(xt, xt−1; δt)
χq(xt−1; δt)with χq(xt−1; δt) =
∫kq(xt, xt−1; δt)dxt.
kq(xt, xt−1; δt) is a kernel in xt with integration constant χq(xt−1; δt), and δt is a set of importance
8
parameters with every element being a function of y1:T . At initial period, the importance density
is simply
q(x1|, y1:T ) =kq(x1; δ1)
χq(δ1)with χq(δt) =
∫kq(x1, ; δ1)dx1.
Using the above importance density, the likelihood (3) can be expressed as
∫p(y1|x1)p(x1)
q(x1|y1:T )
T∏t=2
p(yt|xt)p(xt|xt−1, yt−1)
q(xt|xt−1, y1:T )q(x1:T |y1:T )dx1:T
=1
χq(δ1)
∫p(y1|x1)p(x1)
kq(x1; δ1)/χq(x1; δ2)
T∏t=2
p(yt|xt)p(xt|xt−1, yt−1)
kq(xt, xt−1; δt)/χq(xt; δt+1)q(x1:T |y1:T )dx1:T ,
(5)
starting from χq(xT ; δT+1) = 1.
EIS is particularly suitable in our case because both inverse Gamma and normal distribution
belong to the exponential family which is closed under multiplication. This would mean that
one can choose a conjugate importance kernel with the transition density (4), namely
kq(xt, xt−1; δt) = k(xt, xt−1; δt) · kp(xt, xt−1; yt−1),
where
kp(xt, xt−1; yt−1) = p(xt|xt−1, yt−1)χp(xt−1; yt−1), with χp(xt−1; yt−1) =
∫kp(xt, xt−1; yt−1)dxt.
The likelihood (5) then becomes
χq(δ1)
∫ p(y1|x1)χq(x1;δ2)χp(·;)
k(x1; δ1)
T∏t=2
p(yt|xt) χq(xt;δt+1)χp(xt−1;yt−1)
k(xt, xt−1; δt)q(x1:T |y1:T )dx1:T ,
where χp(·; ) corresponds to the integration constant with respect to the unconditional distribu-
tionN(h1;µ, σ2
1−φ2 ) and IG(W1; ζ2 ,ζ2). It follows from the transition density pN (ht|ht−1, yt−1,Wt−1)
and pIG(Wt) in (4) that
kp(xt, xt−1; yt−1) = exp(µ(1− φ) + φht−1 + ρσεt−1
(1− ρ2)σ2ht −
h2t
12(1− ρ2)σ2
)·W−
ζ2−1
t exp(−ζ2W−1t ).
Let δt = (bt, ct, st, rt). For conjugacy we choose the following kernel
k(xt, xt−1; δt) = exp(btht −1
2cth
2t ) ·W
stt exp(rtW
−1t ) (6)
9
with the ratio of integration constant given by
χq(xt; δt+1)
χp(xt−1; yt−1)=
√vt
(1− ρ2)σ2exp
(1
2(µ2t
vt− (µ(1− φ) + φht−1 + ρσεt−1)2
(1− ρ2)σ2))×
Γ(ζ/2 + rt)
(ζ/2 + rt)ζ/2+st
(ζ/2)ζ/2
Γ(ζ/2),
where
vt =(1− ρ2)σ2
1 + (1− ρ2)σ2ct, and µt = vt
(bt +
µ(1− φ) + φht−1 + ρσεt−1
(1− ρ2)σ2
). (7)
The choice of kernel (6) corresponds to an importance density being the product of a normal
density with mean µt and variance vt defined in (7) and an inverse Gamma density with shape
parameter ζ/2 + st and rate parameter ζ/2 + rt, i.e.
q(xt|xt−1, y1:T ) = N(ht;µt, vt) · IG(Wt;
ζ
2+ st,
ζ
2+ rt
). (8)
The set of importance parameters δt is determined iteratively via a sequence of auxiliary least
square regressions. Briefly, given δ(n)t , J trajectories of x
(j)t = (h
(j)t ,W
(j)t )′ for j = 1, ..., J can be
drawn using (8). At each t, δ(n+1)t is determined by solving the following least square regression
δ(n+1)t = arg min
δt
J∑j=1
[(log p(yt|x(j)
t ) + logχq(x
(j)t ; δ
(n+1)t+1 )
χp(x(j)t−1; yt−1)
)−(γt + log k(x
(j)t , x
(j)t−1; δt)
)]2
,
(9)
where γt is a normalizing constant. Effectively EIS finds the minimiser δt for the variance of the
ratio or the importance weight
p(yt|xt) χq(xt;δt+1)χp(xt−1;yt−1)
k(xt, xt−1; δt)=
p(yt|xt)p(xt|xt−1, yt−1)
kq(xt, xt−1; δt)/χq(xt; δt+1). (10)
Because exponential family kernels such as (6) are log-linear, the regression is a basic OLS
with regressors (h(j)t , h
(j)t
2,− logW
(j)t ,−1/W
(j)t ). As shown by Richard and Zhang (2007) and
Scharth and Kohn (2016), the backward-shift of the period t+1 integration constant χq(xt; δt+1)
is crucial for obtaining a globally efficient importance density as it depends on both the lagged
and future states.
Once the importance density is determined, we apply particle Gibbs with ancestor sampling
(PGAS) which is originally developed by Lindsten et al. (2014) to sample x1:T from p(x1:T |y1:T ),
i.e. to sample (h1:T ,W1:T )|y1:T . Because EIS is employed inside PGAS, we term our sampler
10
EIS-PGAS, and it belongs to a bigger family of particle Gibbs (PG) (see e.g., Chopin et al.,
2013) based on particle filtering or sequential Monte Carlo method (see Pitt and Shephard, 999a
and Doucet et al., 2001 for a general discussion).
The first step is to generate a particle system containing M particles xi1:t−1Mi=1 and asso-
ciated weights ωit−1Mi=1 to recursively approximate the filtering distribution p(x1:t−1|y1:t−1) by
a sum of Dirac delta functions D(.), i.e.
p(x1:t−1|y1:t−1) =
M∑i=1
ωit−1∑Mj=1 ω
jt−1
D(x1:t−1 − xi1:t−1).
Secondly at time t EIS-PGAS updates the particle system by sampling ait, xtMi=1 independently
from
It(at, xt) =ωatt−1∑Mj=1 ω
jt−1
q(xt|xatt−1, y1:T ),
with at indexing the ancestor particle, i.e. xi1:t = (xait1:t−1, x
it)
2. Notice the “ancestor-weighted”
importance density It(at, xt) depends on the whole particle system up to time t − 1. Let x?1:T
denote the reference trajectory which is a previous draw from p(x1:T |y1:T ). The particle system
is then augmented with x?1:T by assigning xM+1t = x?t in the next step. EIS-PGAS differs from
a standard PG because it also samples the ancestor for the reference trajectory according to
p(aM+1t = i) =
ωit−1p(x?t |xit−1, yt−1)∑M+1
j=1 ωjt−1p(x?t |x
jt−1, yt−1)
, (11)
and then the history of reference trajectory is “rewritten” by setting xM+11:t = (x
aM+1t
1:t−1 , xM+1t ).
The recursion for each t is finished by re-weighting the augmented system according to
ωit =p(yt|xit)p(xit|xit−1, yt−1)
q(xit|xit−1, y1:T ), for i = 1, ...,M + 1. (12)
Once t = T , the last step of EIS-PGAS is to sample a new path x+1:T from
p(x1:T |y1:T ) =
M+1∑i=1
ωiT∑M+1j=1 ωjT
D(x1:T − xi1:T ), (13)
which serves as the reference trajectory x?1:T in the next MCMC run. The rearrangement of
reference trajectory (11) comes directly from Bayesian updating of the probability that xit−1 is
2Initialisation of the system is straightforward via sampling from the unconditional distribution of xt.
11
the ancestor of x?t with the prior belief of this probability being ωit−1. The Bayesian updating
effectively breaks down the reference trajectory into pieces. As a result x+1:T is substantially
different from x?1:T with high probability, thus improving mixing compared to standard PG.
Let the sequence of densities p(xt|xt−1, yt−1) in (4) for t = 1, ..., T be defined on the mea-
surable space (X?1:T ,F1:T ) with parameters θ ∈ Θ. EIS-PGAS defines a Markov kernel KMθ on
(X?1:T ,F1:T ) that maps x?1:T stochastically into x+
1:T for any M ≥ 0. It is easy to show that
EIS-PG (i.e. EIS-PGAS without ancestor sampling step (11)) leads to a reversible and ergodic
Markov kernel on (X?1:T ,F1:T ) according to Chopin et al. (2013). Based on results in Lindsten
et al. (2014) the following theorem shows the invariance property of EIS-PGAS.
Theorem 1. The EIS-PGAS kernel KMθ parametrised by θ ∈ Θ with any M ≥ 0 leaves the
posterior probability density function p(x1:T |y1:T ) invariant:
∫Bp(x1:T |y1:T )dx1:T =
∫KMθ (x?1:T , B)p(x?1:T |y1:T )dx?1:T , ∀B ∈ F1:T .
In the next section, we give MCMC algorithm for sampling θ given x1:T or (h1:T ,W1:T ) and
y1:T . For the sampler to converge, KMθ is required to be ergodic. Making the dependence on θ
explicit for the importance weights in (12) and assuming a boundedness condition for ωiθ,t, we
can establish uniform ergodicity of EIS-PGAS kernel in the following theorem.
Theorem 2. Suppose for any t = 1, ..., T and θ ∈ Θ, given xi1:t−1M+1i=1 and B ∈ F1:T ,
supxt
(maxiωiθ,t)≤ ωθ <∞.
Then for any M ≥ 1 and θ ∈ Θ, there exists some ϕ ∈ [0, 1) such that
∣∣∣∣∣∣∣∣(KMθ )n(x?1:T , B)−∫Bp(x1:T |y1:T )dx1:T
∣∣∣∣∣∣∣∣TV
≤ p(y1:T )
(N − 1
Nωθ
)Tϕn.
The Appendix discusses the above two Theorems in more details. Based on a Monte Carlo
study in Section 4 in comparison to the method proposed by Nakajima and Omori (2012), we
show that the ancestor sampling improves the mixing of the Markov chain for the latent state
(h1:T ,W1:T ) as well as for the hyperparamters θ. Moreover, the incorporation of EIS further
improves efficiency supporting the boundedness condition in Theorem 2. It is largely due to the
joint contribution to efficiency through EIS-PGAS that allows us to study the high-dimensional
factor stochastic volatility model.
12
2.2.2 Sampling θ|y1:T , h1:T ,W1:T
Let π0(·) and π(|·) denote prior and posterior distribution respectively unless the conditioning
set is stated otherwise. The sampling procedure for θ = (σ, ρ, φ, µ, β, ζ) described below mainly
follows from Nakajima and Omori (2012) with exception of sampling σ and ρ.
(i). Sampling autoregressive coefficient φ. Given the rest, the conditional posterior distribution
is
π(φ|·) ∝ π0(φ)√
1− φ2 exp
−(1− φ2)h2
1
2σ2−T−1∑t=1
(ht+1 − φht − ρσεt)2
2σ2(1− ρ2)
∝ π0(φ)√
1− φ2 exp
−
(φ− µφ)2
2σ2φ
,
where ht = ht − µ and
µφ =
∑T−1t=1 (ht+1 − σρεt)htρ2h2
1 +∑T−1
t=2 h2t
, σ2φ =
σ2(1− ρ2)
ρ2h21 +
∑T−1t=2 h2
t
.
Metropolis-Hastings (M-H) algorithm is employed to sample from the above posterior. We draw
a candidate φ∗ from N(µφ, σ2φ) truncated within (−1, 1) to ensure stationarity. The candidate
is accepted with probability
min
π0(φ∗)
√1− φ∗2
π0(φ)√
1− φ2, 1
.
(ii). Sampling volatility of volatility σ and leverage coefficient ρ. The joint posterior probability
distribution π(σ, ρ|) is given by
π(σ, ρ|·) ∝ π0(σ, ρ)σ−T (1− ρ2)−T−12 exp
−(1− φ2)h2
1
2σ2−T−1∑t=1
(ht+1 − φht − ρσεt)2
2σ2(1− ρ2)
.
We can reparameterise the likelihood in the above expression by ϑ = ρσ and $ = σ2(1 − ρ2).
If we factorise the joint prior as π0(ϑ|$)π0($) and choose π0($) to be IG(s0, r0) and π0(ϑ|$)
to be N(ϑ0, v2ϑ$), i.e. a normal-inverse-gamma conjugate prior, new draws can be efficiently
13
generated from ϑ|· ∼ N(µϑ, σ2ϑ$), where $|· ∼ IG(r1, s1) and
σ2ϑ =
1
v2ϑ
+T−1∑t=1
ε2t
−1
, µϑ = σ2ϑ
ϑ0
v2ϑ
+T−1∑t=1
εt(ht+1 − φht)
,
s1 = s0 +T
2, r1 = r0 +
1
2
T−1∑t=1
(ht+1 − φht)2 −µ2ϑ
σ2ϑ
+ϑ2
0
v2ϑ
.
(14)
The Markov chain is accordingly updated with σ =√ϑ2 +$ and ρ = ϑ/σ. Besides efficiency,
this reparametrisation can be easily modified to incorporate shrinkage. This becomes clear when
we study the multivariate model in the next section.
(iii). Sampling unconditional mean µ of ht. Let the prior distribution of the unconditional
mean µ be N(µ0, v2µ). The conditional posterior distribution is given by
π(µ|) ∝ exp
−(µ− µ0)2
2v2µ
− (1− φ2)h21
2σ2−T−1∑t=1
(ht+1 − φht − ρσεt)2
2σ2(1− ρ2)
.
We can generate a new draw µ| ∼ N(µµ, σ2µ) with
σ2µ =
1
v2µ
+(1− ρ2)(1− φ2) + (T − 1)(1− φ)2
σ2(1− ρ2)
−1
,
µµ = σ2µ
µ0
v2µ
+(1− ρ2)(1− φ2)h1 + (1− φ)
∑T−1t=1 (ht+1 − φht − ρσεt)
σ2(1− ρ2)
.
(iv). Sampling skewness parameter β. Let the prior distribution of the skewness parameter β
be N(β0, v2β). Denoting Wt = Wt − ζ
ζ−2 , the conditional posterior distribution follows
π(β|) ∝ exp
−(β − β0)2
2v2β
−T∑t=1
(yt − βWteht2 )2
2Wteht−T−1∑t=1
(ht+1 − φht − ρσ(yte−ht2 − βWt)/
√Wt)
2
2σ2(1− ρ2)
,
from which a new draw β| ∼ N(µβ, σ2β) can be generated with
σ2β =
1
v2β
+1
1− ρ2
T−1∑t=1
W 2t
Wt+W 2T
WT
−1
,
µβ = σ2β
β0
v2β
+1
1− ρ2
T−1∑t=1
ytWt
Wteht2
+T−1∑t=1
(ht+1 − φht)ρWt
σ(1− ρ2)√Wt
.
(15)
14
In the next section, we modify the normal prior for β to allow from shrinkage.
(v). Sampling Wt d.o.f parameter ζ. A Gamma prior π0(ζ) ≡ G(sζ , rζ) is used for the d.o.f
parameter of the mixture process Wt. The conditional posterior distribution of ζ involves the
full joint likelihood which follows
π(ζ|) ∝ π0(ζ)
T∏t=1
IG(Wt;ζ
2,ζ
2) exp
−
T∑t=1
(yt − βWteht/2)2
2Wteht−T−1∑t=1
(ht+1 − φht − ρσεt)2
2σ2(1− ρ2)
.
M-H algorithm is used to draw δ = log(ζ−4), based on a normal approximation of the logarithm
of transformed posterior density function log π(δ|) whose mode and the second derivative around
the model are µδ and σ2δ , respectively. The draw is accepted with probability
min
π(ζ∗|)N(δ;µδ,−σ−2
δ ) exp(ζ)
π(ζ|)N(δ∗;µδ,−σ−2δ ) exp(ζ∗)
, 1
.
2.3 Estimation of factor stochastic volatility model
Based on our formulation of univariate SV model in the previous sections, we can write the
factor stochastic volatility model compactly as follows,
yt = Λft + ut, t = 1, ..., T,
fj,tTt=1 ∼ Model (1), ∀j ∈ 1, ..., p,
ui,tTt=1 ∼ Model (1), ∀i ∈ 1, ..., n,
(16)
where yt ∈ Rn, ft ∈ Rp and Λ ∈ Rn×p. Model (16) tells that each factor process fj,t and process
of idiosyncratic noise ui,t follow the SV model introduced in section 2.1. The model proposed
above is motivated by both Chib et al. (2006) and Nakajima (2015), but it is considerably more
flexible than these two. The former although models factor structure, it ignores possible heavy-
tailedness and skewness in the factor process. The latter builds on Nakajima and Omori (2012)
who is the first to model SV using the generalised hyperbolic skew Student’s t-error, but it does
not achieve dimension reduction considering the probable factor structure in asset returns.
It is easy to see that model (16) is that it is scalable both in n, the number of assets, and in
p, the number of factors. As long as an efficient sampling procedure is available for analysing
the univairate Model (1), the computational cost is only linear in the dimension of the return
15
series yt3. In the following subsection, we talk about an efficient MCMC algorithm for sampling
factors ft and the factor loadings Λ, followed by a discussion on shrinkage of skewness.
2.3.1 MCMC algorithm for the multivariate model
To distinguish the factor SV and the mixture component from asset-specific processes, we use
hj,t and Wj,t to denote the SV process and the inverse gamma mixture component for the factor
fj,t. li,t and Qi,t denote those for the idiosyncratic noise ui,t. Superscripts fj and ui are used to
distinguish related parameters. That is, for the i-th return series, i = 1, ..., n, the model reads
yi,t =
p∑j=1
Λij(αfj + βfjWj,t +
√Wj,tξj,t)e
hj,t/2 + (αui + βuiQi,t +√Qi,tεi,t)e
li,t/2.
Let h and l denote the set of SV processes corresponding to the p-dimensional factors ft and the
n-dimensional idiosyncratic noise ut respectively for t = 1, ..., T , namely h = h1, ..., hp where
hj = hj,tTt=1 for j = 1, ..., p, and l = l1, ..., ln where li = li,tTt=1 for i = 1, ..., n. We denote
the set of mixture component by W and Q in a similar fashion. A model with n assets and p
factors has 6(n+ p) + np− (p2 + p)/2 parameters with usual identification restriction imposed
on factor loadings Λ.
(i). Sampling h, l, W , Q and associated hyperparameters. Similar to Chib et al. (2006), the
multivariate model (16) can be separated into n + p univariate SV models as in (1) due to
independence structure conditional on factor process ftTt=1 and loadings Λ. Let the (n + p)-
dimensional vector zk,t be zj,t = fj,t for j = 1, ..., p and zp+i,t = ui,t for i = 1, ..., n and t = 1, ..., T
with ut = yt − Λft. Then one can apply MCMC algorithm developed in Section 2.2 to analyse
zk,t for k = 1, ..., n+ p.
Though model (16) is very flexible, it models factor and asset-specific dynamics in a non-
discriminatory fashion. One research question addressed in this paper is to find the sources of
skewness and leverage effect observed in asset returns. Are they systematic or idiosyncratic?
To given an answer, we modify π0(β) and π0(ϑ|$) in Section 2.2 by sparsity prior usually used
in Bayesian variable selection (Clyde and George, 2004). Let θk = (σk, ρk, φk, µk, βk, ζk) collect
all the hyperparameters pertaining to the k-th univariate stochastic volatility model zk,t for
3Linear complexity allows for parallel computing which greatly saves computing time.
16
k = 1, ..., n+ p. The sparsity prior πsparse0 (βk) takes the form
∆βD0(βk) + (1−∆β)N(β0, v2β),
where D0(.) denotes the Dirac delta function at zero, and N(β0, v2β) is the normal prior intro-
duced previously. This prior means that βk has shrinkage probability ∆β with a point mass at
zero and probability 1 − ∆β of taking a value that is N(β0, v2β)-distributed. Under the above
sparsity prior, the conditional posterior distribution of βk is given by
βk|· ∼ ∆βkD0(βk) + (1−∆βk)N(µβk , σ2βk
),
where µβk and σ2βk
are the mean and variance of the normal posterior distribution as defined in
(15), and where
∆βk =1−∆β
∆βσ2βk
+ 1−∆β, with σ2
βk=σβkvβ
exp(µ2βk
2σ2βk
). (17)
The shrinkage probability ∆β has a beta conjugate prior, so posterior draws can be generated
given the number of non-zero βk’s in the Markov chain.
To shrink ρk is equivalent to let ϑk|$k have a sparsity prior under the reparametrisation
ϑk = ρkσk and $k = σ2k(1 − ρ2
k). The reparametrisation equips ϑk|$k with a normal prior as
discussed in Section 2.2, so πsparse0 (ϑk|$k) takes the form
∆ϑD0(ϑk) + (1−∆ϑ)N(ϑ0, v2ϑϕk).
The conditional posterior distribution is thus given by
ϑk|· ∼ ∆ϑkD0(ϑk) + (1−∆ϑk)N(µϑk , σ2ϑkϕk),
where µϑk and σ2ϑk
are defined in (14). ∆ϑk is defined similarly as (17).
As mentioned before, shrinkage on both βk and ρk may help explain the sources for skewness
and leverage effect — two important “stylised facts” observed in asset returns. It also leads to
different parsimonious models with some βk’s and ρk’s being zero, which is expected to improve
forecasting performance for value-at-risk (VaR) and covariance matrix (Nakajima, 2015). This
also reduces the effort for numerous model comparisons when n is large.
17
(ii). Sampling factors ft. Let us suppress the dependence on h, l, W and Q, as well as all
hyperparameters. Given the factor loadings Λ we have
yt|ft ∼ N(Λft, Ut), ft ∼ N(Ft, Vt),
where yt = (y1,t, ..., yn,t)′ and Ft = (F1,t, ..., Fp,t)
′ with
yi,t = yi,t − (αui + βuiQi,t)eli,t/2, Fj,t = (αfj + βfjWj,t)e
hj,t/2,
Vt = diag(W1,teh1,t , ...,Wp,te
hp,t), Ut = diag(Q1,tel1,t , ..., Qn,te
ln,t).(18)
Basic Bayesian calculation shows that at each t the conditional posterior distribution of factor
ft is N(µft ,Σft) where
Σft = (Λ′U−1t Λ + V −1
t )−1, µft = Σft(Λ′U−1t yt + V −1
t Ft),
(iii). Sampling factor loadings Λ. We impose usual identification restriction on the loading
matrix Λ, i.e. the upper p-by-p sub-matrix is lower triangular with ones on the diagonal. It is
easy to see that given factors ft, the conditional posterior distribution for the free elements in
Λ will be normal with a normal prior. However due to the product form Λft appearing in the
likelihood, one can expect the draw of Λ conditional on ft to be inefficient. Chib et al. (2006)
shows that efficiency can be improved via marginalisation of ft.
Given h, l, W and Q, the conditional log-likelihood function is
l(y1:T |Λ) =T∑t=1
lt(yt|Λ) = logN(yt; ΛFt,Ωt)
= −1
2
T∑t=1
k log 2π + log |Ωt|+ (yt − ΛFt)
′Ω−1(yt − ΛFt),
(19)
where Ωt = ΛVtΛ′+Ut. The M-H algorithm of Chib and Greenberg (1994) is applied to sample
vec(Λ)|· using a multivariate Student’s t-proposal density T (µΛ,ΣΛ, v) where µΛ is the mode
of l(y1:T |Λ) and ΣΛ equals minus the inverse of the approximate Hessian matrix of l(y1:T |Λ)
around its mode. Degrees of freedom v is chosen arbitrarily. To find the mode, we propose to
use Hessian-free optimisation routine such as L-BFGS and quasi-Newton method (Wright and
Nocedal, 1999), based on the following score function ∂l(y1:T |Λ)/∂Λij =∑T
t=1 ∂lt(yt|Λ)/∂λij
18
with λij denoting the ij-th free element of Λ and
∂lt(yt|Λ)
∂λij=− 1
2
∂ log |Ωt|∂λij
+∂
∂λij
(y′tΩ−1t yt − 2y′tΩ
−1t ΛFt + F ′tΛ
′Ω−1t ΛFt
)=− tr
(Ω−1t ΛVt
∂Λ′
∂λij
)+ y′tΩ
−1t
∂Λ
∂λijVtΛ
′Ω−1t yt + y′tΩ
−1t
∂Λ
∂λij(Ip − 2VtΛ
′Ω−1t Λ)Ft
+ F ′t
(Λ′Ω−1
t
∂Λ
∂λijVtΛ
′Ω−1t Λ− 1
2(∂Λ′
∂λijΩ−1t Λ + Λ′Ω−1
t
∂Λ
∂λij)
)Ft,
where Ω−1t = U−1
t −U−1t Λ(V −1
t +Λ′U−1t Λ)−1Λ′U−1
t . After some convergence criterion is met, we
compute the inverse of observed information matrix, i.e. ΣΛ = (G(µΛ)G(µΛ)′)−1 where G(µΛ) is
the gradient matrix whose t-th column equals vec(∂lt(yt|Λ)/∂λiji,j) with i, j running through
all free elements of Λ. Then a candidate draw vec(Λ∗) can be generated from the proposal with
acceptance probability
min
π0(vec(Λ∗)) exp(l(y1:T |Λ∗))T (vec(Λ);µΛ,ΣΛ, v)
π0(vec(Λ)) exp(l(y1:T |Λ))T (vec(Λ∗);µΛ,ΣΛ, v), 1
.
2.3.2 Initialisation
When the dimension of assets yt in model (16) is large, we advocate to initialise the Markov
chain efficiently rather than starting randomly, although Theorem 2 says that the Markov kernel
KMθ implied by EIS-PGAS is guaranteed to converge starting from almost anywhere. In our
experiment, such an initialisation may save hours of computation time and accelerates the
convergence of the Markov chain to its stationary distribution.
We propose to initialise our model through principal components (PC). Let us rewrite model
(16) as Y = FΛ′ + u where Y ∈ RT×n, F ∈ RT×p and u ∈ RT×n. So the t-th row in of Y , F
and u are y′t, f′t and u′t respectively, and ft is chosen to be the PC’s at time t . Or equivalently
we have
y = (In ⊗ F )λ+ u, (20)
where y = vec(Y ), λ = vec(Λ′), and u = vec(u). Under conditions specified by Doz et al.
(2011), PC’s are consistent estimate of the factors, and we apply the criterion in Bai and Ng
(2002) to choose the preliminary number of factors. Because we impose identification restriction
on Λ, the matrix of eigenvectors in relation to PC’s cannot initialise Λ. We notice that (20) is a
linear regression model in λ and the identification restriction implies a linear constraint of the
form
Rλ = r.
19
This means we can choose the constraint OLS estimate λcols to initialise Λ, which is
λcols = λols − (In ⊗ (F ′F )−1)R′(R(In ⊗ (F ′F )−1)R′)−1(Rλols − r), (21)
where λols = (In ⊗ (F ′F )−1F ′)y. Given λcols, Doz et al. (2011) suggest that the estimate of
factors E(ft|y1:T ) can be obtained by
ft = (Λcols′Λcols)
−1Λcols′yt, t = 1, ..., T. (22)
The initialisation of factors ft for t = 1, ..., T and loadings Λ is completed with iterations over
(21) and (22) until convergence. It can be expected that the above procedure delivers a sound
initialisation especially for the loading matrix Λ with identification restriction as Chan et al.
(2013) show that there exists a mapping which effectively rotates the PC’s towards the factors
under certain identification restriction scheme imposed on the loadings.
With initialised Λ and ft, residuals are obtained as ut = yt−Λft. So we obtain n+p univariate
series zj,t = fj,t for j = 1, ..., p and zp+i,t = ui,t for i = 1, ..., n. For any k ∈ 1, ..., n + p,
zk,tTt=1 is modelled as a basic SV model and reparametrise it according to Ruiz (1994), so that
a quasi -maximum likelihood (QML) estimation can be efficiently implemented to the following
approximate linear Gaussian state space model
log(z2k,t) = log(2) + ψ(1/2) + hk,t +
√ψ′(1/2)εk,t, t = 1, ..., T,
hk,t+1 = µk(1− φk) + φkhk,t + σkηt, t = 1, ..., T − 1,(23)
where ψ(·) is the digamma function and ψ′(·) is its first order derivative. εk,t and ηk,t are i.i.d
normal with correlation coefficient ρk. Maximising the log-likelihood via Kalman filter (Durbin
and Koopman 2012) gives the QML estimate of φk, σk, ρk and µk, which serve as initialisations
for k = 1, ..., n+ p. We choose the initial value of the skewness parameter βk to be zero and the
d.o.f ζk to be 20 for all k.
The Markov chain of the SV process hj,tTt=1 and li,tTt=1 for all i and j is initialised by
applying the simulation smoother of De Jong and Shephard (1995) to state space model (23).
The chain of the inverse gamma mixing component Wj,t and Qi,t for all i and j is initialised by
drawing from IG(sk, rk,t), where sk = ζk/2 + 1 and rk,t = ζk/2 + z2k,t exp(−hk,t)/2.
20
3 Model evaluation
This section introduces method for Bayesian model comparison which relies on the calcula-
tion of marginal likelihood p(y1:T |M) under a certain model M from which the Bayes factor
p(y1:T |M1)/p(y1:T |M2) can be calculated for modelM1 andM2. We implement a marginalised
version of the importance sampling squared (IS2) of Tran et al. (2014) for the proposed factor
SV model. IS2 produces an efficient and accurate estimate of marginal likelihood when the
conditional likelihood p(y1:T |Mi, θi) with hyperparameter vector θi is intractable, but can be
estimated unbiasedly. A detailed simulation study in the Appendix shows that IS2 is fully func-
tional and accurate in picking the true model, bolstering the result of Tran et al. (2014) who
only apply IS2 to univariate SV models.
To start with, suppressing the dependence on a certain model, the marginal likelihood can
be written as
p(y1:T ) =
∫p(y1:T , θ)dθ =
∫p(y1:T |θ)π0(θ)dθ
=
∫p(y1:T |θ)π0(θ)
q(θ|y1:T )q(θ|y1:T )dθ,
where π0(θ) is the prior, and q(θ|y1:T ) is an importance density mimicking the posterior π(θ|y1:T ) ∝
p(y1:T |θ)π0(θ). The above integral can thus be approximated by Monte Carlo simulation
p(y1:T ) =1
S
∑s=1
w(θs), where w(θs) =p(y1:T |θs)π0(θs)
q(θs|y1:T )and θs ∼ q(θ|y1:T ). (24)
The is straightforward to implement if the likelihood p(y1:T |θ) is available in closed form. How-
ever, in our case
p(y1:T |θ) =
∫· · ·∫p(y1:T , f1:T , h1:T , l1:T ,W1:T , Q1:T |θ)df1:Tdh1:Tdl1:TdW1:TdQ1:T ,
which is high-dimensional and intractable. Tran et al. (2014) show that under mild conditions
that if there exists an unbiased estimate of the likelihood, i.e. E(p(y1:T |θ)) = p(y1:T |θ), averaging
importance weights to compute the marginal likelihood as formula (24) is still valid with p(y1:T |θ)
replaced by p(y1:T |θ). In the next subsection, a simulation-based method is introduced for
efficiently computing marginal likelihood using an unbiased estimate of the likelihood.
21
3.1 Marginal likelihood
For many state space models, an unbiased estimate of likelihood is readily available using par-
ticle marginal Metropolis-Hastings (PMMH) algorithm (Andrieu et al. 2010 and Del Moral and
Formulae 2004). For example, the paper of factor SV model by Chib et al. (2006) applies the
celebrated auxiliary particle filter (APF) of Pitt and Shephard (999a) to compute the posterior
ordinate for the evaluation of Bayes factors. According to Scharth and Kohn (2016), parti-
cle efficient importance sampling (PEIS) algorithm significantly outperforms PMMH in terms
of variance reduction for the likelihood estimate which is essential for efficient computation of
marginal likelihood. Similar to the EIS-PGAS method introduced in section 2.2 which builds
a sequential but globally optimal importance density q(xt|xt−1, y1:T ), PEIS is a direct exten-
sion of APF algorithm which only uses one-period (or few periods) forward weights based on
q(xt|xt−1, y1:t) for resampling. Conceptually, the global optimality of PEIS which minimises
the importance weights in (10) an (5) is what makes it highly efficient for evaluating marginal
likelihood.
Similar to EIS-PGAS, PEIS constructs the importance density for xt at t = 1, ..., T where
xt = hj,tpj=1, Wj,tpj=1, li,tni=1, Qi,tni=1.
which involves solving a high dimensional least square problem in light of (9). Computation
becomes costly in this case. Scharth and Kohn (2016) also document that numerical instability
can cause failure of PEIS with an ill-constructed importance density. To circumvent this issue,
ft is firstly marginalised out throughout the analysis. Given θ we have
yt|xt ∼ N(yt + ΛFt,Ωt), (25)
where Ωt = ΛVtΛ′+Ut, and yi,t for i = 1, ..., n, Fj,t for j = 1, ..., p, Vt and Ut are defined in (18).
Furthermore, we propose to use a “suboptimal” importance density which is formed by p + n
individual importance densities.
Suppose after n-th iteration, one has the importance density with parameters δ(n)t = δ(n)
j,t , δ(n)i,t
for t = 1, ..., T , corresponding to n+ p individual independence densities as in (8) with param-
eters δ(n)j,t for hj,tpj=1,Wj,tpj=1 and parameters δ(n)
i,t ni=1 for li,tni=1, Q(s)i,t ni=1. Because ft is
marginalised out, we cannot recover the leverage effect in either factor or asset-specific process
which are present in the evolution of the SV process. However, we notice that the factor process
22
at time t can be approximated by
f(n)t = (Λ′U
(n)t
−1Λ + V
(n)t
−1)−1(Λ′U
(n)t
−1y
(n)t + V
(n)t
−1F
(n)t ), (26)
and the idiosyncratic noise at time t is thus
u(n)t = yt − Λf
(n)t .
With both f(n)t and u
(n)t , leverage effect can be accounted for so we can update δ
(n+1)j,t and
δ(n+1)i,t by n+ p least square regressions as in (9). Approximating the importance density for the
multivariate model by building p+ n EIS individual importance densities means this procedure
is suboptimal because ft and ut are not observed. Possible correlations among hj,t|y1:T across
all j and among li,t|y1:T across all i are also discarded. Nevertheless we find this procedure
works sufficiently good without any numerical failure for all simulated and real datasets that we
analyse.
To compute the unbiased estimate of the likelihood function p(y1:T |θ) = p(y1|θ)∏Tt=2 p(yt|y1:t−1, θ),
one propagates the particle system with forward weights resampling (Shephard and Pitt 1997
and Scharth and Kohn 2016). Suppose at time t, one has the particle system xi1:t, ωitMi=1.
Suppressing the dependence on θ, the forward weights are calculated according to
−→ω it = ωit
χq(xit; δt+1)
χp(xit−1; yt−1), i = 1, ...,M,
where ωit is the normalized weight ωit = ωit/∑M
i=1 ωit. χq(x
it; δt+1) is the integration constant
of the importance density q(xt+1|xit, y1:T ) with kernel kq(xit+1, x
it; δt+1), while χp(x
it−1; yt−1) is
the integration constant of the transition density p(xt|xit−1, yt−1) which is the product of n+ p
densities. Also, q(xit+1|xit, y1:T ) is the product of n + p individual EIS importance densities.
Next, with the normalised forward weights
−→ωi
t =−→ω it∑M
j=1−→ω jt
,
one calculates the effective sample size ESS = 1/∑M
i=1(−→ωi
t)2. If ESS drops below a predeter-
mined threshold, resampling is applied to M particles xitMi=1 with probability −→ωi
tMi=1, and
all normalised weights ωit are set to be 1/M for i = 1, ...,M . At time t + 1, M new particles
xit+1Mi=1 need to be generated from the importance density q(xt+1|xit, y1:T ), which requires M
23
draws from p+ n individual importance densities with each as in (8). To do so, we can approx-
imate the factor process and idiosyncratic noise process at time t via (26). For example, hij,t+1
for i = 1, ...,M can be obtained by
hij,t+1 = µfj (1− φfj ) + φfjhij,t +ρfjσfj
W ij,t
(f ij,t − αfj − βfjW ij,t) +
√1− ρfj 2
σfjη∗j,ti,
with η∗j,ti ∼ N(0, 1) for j = 1, ..., p. Other latent process propagates similarly. Antithetic
variables are used to reduce Monte Carlo noise during particle propagation. In particular, pairs
of perfectly negatively correlated Gaussian variables are generated for all SV processes (Durbin
and Koopman 2000 and Scharth and Kohn 2016), and pairs of inverse gamma variables are
generated using a Gaussian copula with perfect negative correlation for the mixture components.
Once the prorogation of all particles is finished, the importance weights are recalculated as
ωit+1 =
ωit × p(yt+1|xit+1)p(xit+1|xit, yt)/kq(xit+1, x
it; δt+1), if resampling
ωit × p(yt+1|xit+1)p(xit+1|xit, yt)/q(xit+1|xit, y1:T ), otherwise,
where the conditional observation density p(yt|xt) is given by (25). The propagation of particle
system at time t completes by calculating the estimate of the likelihood contribution via
p(yt+1|y1:t) =
(∑M
i=1−→ω it)(∑M
i=1 ωit+1), if resampling∑M
i=1 ωit+1, otherwise,
Once the propagation of particle system reaches time T , the unbiased estimate of likelihood is
simply given by p(y1:T |θ) = p(y1|θ)∏Tt=2 p(yt|y1:t−1, θ) with obvious modification to p(y1|θ).
The unbiased estimate of the likelihood p(y1:T |θ) found by PEIS is a central piece for applying
IS2 to find the estimate of marginal likelihood. It follows that
p(y1:T ) =1
S
S∑s=1
w(θs), where w(θs) =p(y1:T |θs)π0(θs)
q(θs|y1:T )and θs ∼ q(θ|y1:T ). (27)
We follow Tran et al. (2014) to choose the optimal number of particles in PEIS which balances
the trade-off between variance reduction and computing time. They show that when there is
overhead cost of designing importance density, such as computing time allocated to find the
EIS importance density, the total computing time is the product of Var(log p(y1:T |θ)) in its
exponent and the overhead cost. This sheds some light on our proposed way of constructing the
24
importance density – it is less costly and more stable to build p + n EIS importance densities
as approximation than to build an exact high-dimensional EIS importance density.
Lastly, we choose q(θ|y1:T ) to be a m-component Gaussian mixture constructed from Markov
chain for θ via EM algorithm. The number of component m is determined by BIC.
3.2 Forecasting and filtering
It is straightforward to perform forecasting based on the output from a particle filtering algo-
rithm. Keeping θ at its posterior mean, at time T the particles xkT Kk=1 with normalised weights
ωkT Kk=14 are propagated 1-period forward based on their transition dynamics. Due to leverage
effect, we still approximate factors and idiosyncratic noise at time T through (26). The 1-period
ahead forecast yT+1 is given by
yT+1|y1:T , θ =
K∑k=1
ωkT (ykT+1|y1:T , θ),
where each ykT+1|y1:T , θ is imputed from
ykT+1|y1:T , θ ∼ N(ykT+1 + ΛF kT+1,ΩkT+1), Ωk
T+1 = ΛV kT+1Λ′ + UkT+1,
where ykT+1, F kt+1, V kT+1 and UkT+1 are as in (18). Propagating all SV processes and inverse
gamma mixture components S-period forward, S ≥ 2, gives the multi-period ahead forecast
yT+S |y1:T , θ =K∑k=1
ωkT (ykT+S |y1:T , θ). (28)
The predicted total return over S periods∑S
s=1 yT+s|y1:T , θ thus follows a mixture Gaussian
distributionS∑s=1
yT+s|y1:T , θ ∼K∑k=1
ωkTN(S∑s=1
ykT+s + ΛF kT+s,S∑s=1
ΩkT+s).
The above can be used to estimate moments of returns over S periods which are essential for
portfolio management and benchmarking, and also for other statistics such as tail index and
VaR.
It is also of interest to find the filtered estimate of mean return, covariance matrix and
correlation matrix when financial decisions need to made online. The filtered mean return and
4K here is chosen to be larger than the number of particles M used to estimate the likelihood.
25
covariance matrix are
µt|t−1 = E(yt|y1:t−1, θ), Ωt|t−1 = E(Ωt|y1:t−1, θ),
which can be estimated by
µt|t−1 =K∑k=1
ωkt−1(ykt + ΛF kt ), Ωt|t−1 =K∑k=1
ωkt−1Ωkt ,
where Ωkt = ΛV k
t Λ′ + Ukt and ωkt−1Kk=1 are the normalised weights pertaining to the particle
system at time t− 1. Chib et al. (2006) express the filtered correlation as
Rt|t−1 = E(Υt|y1:t−1, θ).
Υt is thus the conditional correlation matrix for yt|hj,tpj=1, Wj,tpj=1, li,tni=1, Qi,tni=1, or
Υt = D(Ωt)− 1
2 ΩtD(Ωt)− 1
2 ,
where D(Σ) denotes the matrix with diagonal elements equal to those of Σ and zero off-diagonal
elements. So Rt|t−1 can be estimated by
Rt|t−1 =K∑k=1
ωkt−1Υkt =
K∑k=1
ωkt−1D(Ωkt )− 1
2 ΩktD(Ωk
t )− 1
2 .
4 Simulation study
For ease of exposition, this section only investigates the effectiveness and efficiency of proposed
sampling method via a simulation study of univariate SV model (1). We highlight the contribu-
tion of EIS-PGAS in efficiency for sampling hyperparameters and latent process, in comparison
with the method developed by Nakajima and Omori (2012). An extensive and detailed stimu-
lation study of the high-dimensional factor SV model (16) is given in the Appendix.
To the best of our knowledge, Nakajima and Omori (2012) pioneered the SV model with
leverage effect using the generalised hyperbolic skew Student’s t-distributed errors. They im-
plement a Metropolis-within-Gibbs sampling algorithm exploiting the mean-variance mixture
representation of the error distribution. Inspired by their approach, we propose EIS-PGAS in
section 2.2 and focus on the efficiency gain resulting from both EIS and ancestor sampling.
26
EIS-PGAS samples the SV process ht|y1:T , θ and mixture component Wt|y1:T , θ simultaneously
while Nakajima and Omori apply the multi-mover sampler of Watanabe and Omori (2004) to
sample ht|y1:T ,W1:T , θ and Wt|y1:T ,W1:T , θ sequentially. Their sampler is less efficient than ours
because ht and Wt appear in product form in the observation likelihood p(yt|ht,Wt, θ). In this
simulation study, we show the efficiency gain is more than moderate.
4.1 Model setup
We simulate 500 series each with length T = 2000 from model (1) with fixed parameter values
φ = 0.95, σ = 0.15, ρ = −0.5, µ = −9, β = 0.5, and ζ = 20. Typical time series for yt, ht and
Wt are shown in Figure 2.
0 500 1000 1500 20000.06
0.04
0.02
0.00
0.02
0.04
0.06yt
0 500 1000 1500 200010.5
10.0
9.5
9.0
8.5
8.0
7.5
7.0ht
0 500 1000 1500 20000
1
2
3
4
5
6Wt
Figure 2: A simulated path of yt with latent process ht and Wt. Left: yt; middle: ht; right: Wt.
We specify the following prior
φ+ 1
2∼ Beta(20, 1.5), $ ∼ IG(2.5, 0.025), ϑ|$ ∼ N(0, 20$),
µ ∼ N(−10, 1), β ∼ N(0, 1), ζ ∼ Gamma(20, 1.25)1(ζ>4),
(29)
where $ = (1−ρ2)σ2, ϑ = ρσ, and 1(.) is an indicator function which equals one if the condition
in brackets hold and zero otherwise. The joint prior π0(ϑ,$) = π0(ϑ|$)π0($) is a conjugate
normal-inverse-gamma prior which facilitates the use of shrinkage prior used in the factor SV
model in section 2.3.1. The above prior distributions reflect popular choices in the literature of
SV models. To compare performance of different sampling schemes, we consider the following
methods:
• EIS-PGAS: Our baseline method – particle Gibbs with ancestor sampling and EIS impor-
tance density;
27
• EIS-PG: Basic particle Gibbs with EIS importance density;
• BF-PGAS: Particle Gibbs with ancestor sampling using bootstrap filter;
• MM-MH: The method of Nakajima and Omori (2012) – multi-move sampler for ht, con-
ditional on which Wt is drawn via an accept-reject M-H algorithm.
While BF-PGAS uses 20, 000 particles in the particle propagation, both EIS-PGAS and EIS-PG
use only 10 particles. In total 22, 000 samples for each parameter are drawn with a discarded
burn-in period of initial 2000 samples. We base our comparison on the inefficiency factor to check
the efficiency under different sampling schemes. The inefficiency factor for a certain parameter
θ is defined as IE(θ) = 1 + 2∑∞
j=1 ρj(θ) where ρj(θ) is the j-th sample autocorrelation. Chib
(2001) shows that IE(θ) measures the degree of mixing of the Markov chain for θ|·. If IE(θ) = m,
the MCMC algorithms requires m times more samples than drawing from uncorrelated samples.
We choose Parzen window with bandwidth 1000 to compute the inefficiency factor.
4.2 Estimation results
Figure 3 reports the sample autocorrelation functions (ACF), the Markov chain sample paths
and the posterior density estimates for one simulated series estimated by EIS-PGAS. Figures
obtained from other simulated series do not suggest qualitative difference. From a similar figure
in Nakajima and Omori (2012), one can already see the ACF of parameters estimated by EIS-
PGAS decay much quicker than those by MM-MH, especially for φ, β and ζ, implying higher
efficiency of EIS-PGAS.
0 5000 10000 15000 200000.90
0.92
0.94
0.96
0.98
1.00φ
0 5000 10000 15000 200000.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21σ
0 5000 10000 15000 200000.7
0.6
0.5
0.4
0.3
0.2ρ
0 5000 10000 15000 20000
9.5
9.0
8.5
8.0µ
0 5000 10000 15000 20000
1.0
0.8
0.6
0.4
0.2β
0 5000 10000 15000 2000010
15
20
25
30
35
40ζ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0φ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0σ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0ρ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0µ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0β
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0ζ
0.90 0.92 0.94 0.96 0.98 1.000
5
10
15
20
25
30
35
40
45φ
0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.210
10
20
30
40
50σ
0.7 0.6 0.5 0.4 0.3 0.20
1
2
3
4
5
6
7
8ρ
9.5 9.0 8.5 8.00.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5µ
1.0 0.8 0.6 0.4 0.20.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5β
10 15 20 25 30 35 400.00
0.02
0.04
0.06
0.08
0.10
0.12ζ
Figure 3: EIS-PGAS MCMC results for a randomly chosen series simulated from model (1).From top to bottom: sample autocorrelations, sample paths, and posterior density estimates; From left to right:
φ, σ, ρ, µ, µ, β and ζ.
28
Table 1: Mcmc Results of Different Methods for Univariate Model
EIS-PGAS MM-MHθ Mean St.dev. 95% C.I. IE(θ) Mean St.dev. 95% C.I. IE(θ)
φ 0.955 0.010 [0.939 0.978] 4.526 0.942 0.010 [0.921 0.973] 81.532σ 0.151 0.008 [0.148 0.172] 11.046 0.167 0.015 [0.139 0.202] 159.406ρ -0.512 0.037 [-0.602 -0.415] 23.218 -0.542 0.067 [-0.731 -0.444] 79.157µ -8.979 0.096 [-9.081 -8.755] 4.326 -8.920 0.123 [-9.124 -8.795] 27.582β -0.573 0.101 [-0.790 -0.400] 16.027 -0.714 0.255 [-1.346 -0.318] 163.733ζ 18.963 3.622 [16.462 26.348] 36.356 28.655 5.117 [16.067 37.881] 299.057
EIS-PG BF-PGASθ Mean St.dev. 95% C.I. IE(θ) Mean St.dev. 95% C.I. IE(θ)
φ 0.965 0.011 [0.942 0.975] 64.057 0.812 0.122 [0.764 0.992] 16.746σ 0.162 0.006 [0.154 0.187] 132.744 0.237 0.094 [0.167 0.304] 73.569ρ -0.522 0.034 [-0.712 -0.424] 92.246 -0.204 0.120 [-0.421 0.086] 52.74µ -9.343 0.114 [-9.547 -8.834] 15.321 -10.657 0.284 [-11.050 -8.891] 24.315β -0.688 0.162 [-0.842 -0.460] 93.682 -0.137 0.630 [-1.143 0.722] 51.985ζ 22.462 3.459 [16.785 -30.114] 123.37 46.864 12.795 [19.674 65.049] 96.781
1 Reported is the average of posterior means, standard deviations, lower and upper bounds of 95% credible interval,
and inefficient factors of all the 500 simulated series with each obtained from the 20,000 MCMC samples after a
burn-in period of 2000 samples.2 True DGP: φ = 0.95, σ = 0.15, ρ = −0.5, µ = −9, β = −0.5 and ζ = 20.
Table 1 reports averages of some posterior statistics under the four estimation methods for
all the 500 simulated series. Except for BF-PGAS, the posterior means of all parameters are
found to be close to the DGP. As argued by Lindsten et al. (2014), the frequent resampling in
bootstrap filter may render the PGAS algorithm inaccurate because the probability that the
reference trajectory degenerates to improbable value does not converge to zero. Despite the
inaccurate posterior means, bootstrap filter also leads to much higher standard deviation of
the posterior distribution. The credible intervals of ρ and β covering zero means this method
effectively fails to capture leverage effect and skewness of yt. A conclusion from this observation
is that for the univariate SV model (1), it is important for any sampling scheme to have some
form of importance density which mimics the conditional posterior distribution of the latent
processes in order to deliver reasonable posterior estimates. MM-MH depends on a second order
local approximation of p(h1:T |y1:T ,W1:T , θ) to draw ht, while EIS-based methods generate h1:T
and W1:T together based on a global approximation to p(h1:T ,W1:T |y1:T , θ). Importantly, the
inefficient factors IE(θ) under EIS-PGAS is much lower than that under MM-MH. For some
simulated series, our proposed EIS-PGAS is able to achieve IE(φ), IE(σ) and IE(ζ) that are
20 times smaller than MM-MH. Comparing IE(θ) of EIS-PGAS with that of EIS-PG, it can
29
Table 2: Correlation Matrix of Posterior Samples
EIS-PGAS MM-MHφ σ ρ µ β ζ φ σ ρ µ β ζ
φ 1 -0.27 0.02 0.01 0.00 -0.03 φ 1 -0.64 -0.15 0.05 -0.01 -0.03σ 1 -0.04 -0.03 -0.03 0.05 σ 1 0.13 -0.10 -0.16 0.08ρ 1 0.05 0.14 -0.02 ρ 1 0.04 0.21 0.13µ 1 0.15 0.11 µ 1 0.27 0.18β 1 -0.24 β 1 -0.79ζ 1 ζ 1
EIS-PG BF-PGASφ σ ρ µ β ζ φ σ ρ µ β ζ
φ 1 -0.48 -0.17 0.01 -0.01 -0.04 φ 1 -0.59 -0.11 0.09 0.05 -0.1σ 1 0.09 -0.11 -0.08 0.05 σ 1 -0.15 -0.18 -0.08 0.13ρ 1 0.07 0.13 0.06 ρ 1 0.15 0.08 0.19µ 1 0.19 0.20 µ 1 0.16 0.35β 1 -0.28 β 1 -0.45ζ 1 ζ 1
Reported is the average of correlation matrix of posterior samples of all the 500 simulated series.
be said that the majority of efficiency gain in terms of how well the Markov chain mixes comes
from the ancestor sampling employed in particle Gibbs algorithm; however, the EIS importance
density used when propagating particles is also instrumental for efficiency gain if we compare
EIS-PG with MM-MH or compare EIS-PGAS with BF-PGAS. Notice that although BF-PGAS
achieves good mixing of the Markov chain, the posterior estimates are effectively useless as they
are too far away from the DGP.
Table 2 shows the averages of correlation matrix of the posterior samples under four es-
timation methods. If the correlation coefficient between two chains is near unity in absolute
value, the sampler is inefficient as it samples from a narrow state space. The table shows that
EIS-PGAS samples from a satisfactorily wider state space than the other three methods. In
particular we notice that MM-MH gives corr(β, ζ) = −0.79 and corr(φ, σ) = −0.64 which triple
those values given by EIS-PGAS. This would mean that one needs to base inference of the pos-
terior distribution on much more samples if MM-MH is applied. In addition, MM-MH achieves
accuracy but its Gibbs sampler iterating over h1:T |y1:T ,W1:T , θ and W1:T |y1:T , h1:T , θ based on
a local approximation of the conditional posterior distribution delivers a Markov chain with less
appealing mixing than the EIS-based method which based on a global approximation to the
joint posterior distribution.
We emphasise that the efficiency of a multivariate sampler and the accuracy of calculated
30
marginal likelihood crucially depend on the constructed EIS-PGAS sampler for the univariate
model. Recall that with marginalisation of factors, the factor SV model boils down to n +
p individual univariate models. This would mean that the results from simulation study on
the high-dimensional factor SV model, which is detailed in the Appendix, are expected to be
comparable to the univariate case.
5 Empirical study
With the emergence of powerful computing technologies, traditional hedge funds and investment
institutions are turning their eyes to portfolios with increasing dimensionalities where human
judgment of investment strategy starts to play a diminishing role. One important task is risk
management, i.e. tracking and forecasting the covariance matrix of the portfolio components
over time. In this section, we demonstrate an application using our proposed high-dimensional
factor SV model with the EIS-PGAS estimation method, followed by an exercise on forecasting
the covariance matrix and determination of VaR.
5.1 Data and models
The dataset we use has 80 equity or asset returns which is as far as we know among the largest in
SV literature. It contains weekly stock returns from the S&P100 index5 which are obtained from
Yahoo Finance. It covers periods including the 1998 Asian crisis, the 2000 dot-com bubble, the
2008 U.S. financial crisis and the 2012 European debt crisis with a total data length of T = 1095
trading weeks. Model is formulated as in (16). Additionally, shrinkage priors for ρ and β are
applied because we presume that leverage effect and skewness are not ubiquitous across factors
and individual assets. Table 3 shows the values of Bai and Ng (2002) criteria in determining
the number of factors. Both IC p1 and IC p2 suggest 4 factors while IC p3 suggests 6. Later
we show that based on the IS2 marginal likelihood criterion the number of factor chosen is 4,
supporting the former two non-Bayesian criteria.
5.2 Estimation results
Figure 4 shows the posterior mean estimates of the loadings on the four factors with associated
standard deviations. The first observation is that most loadings tend to have positive sign
with similar magnitudes, which can be interpreted as individual asset’s sensitivity to market
5The 20 stocks in the index composition that are not chosen have too short trading history.
31
Table 3: Number of Factors By Bai&Ng Criteria
number of factorsCriterion
IC p1 IC p2 IC p3
4 -6.52148 -6.51769 -6.533705 -6.52131 -6.51658 -6.536596 -6.52012 -6.51444 -6.53846
The three criteria correspond to Corollary 1 and equation (9) in Bai and Ng (2002). Shaded cell indicates the num-
ber of factors determined by associated criterion.
movement measured by the extracted factor. The fact that model (16) also takes into account
the factor SV and the “shock variable” in the mean suggests that market shocks Wj,t are part
of systematic risk with stochastic weights that are proportional to exp(hj,t/2), j = 1, ..., 4.
For the 2-nd and the 4-th factor, we notice that many loadings are close to zero with distinct
exceptions, suggesting the drivers for these two factors may come from a few asset returns in
the index composition. But we should notice that the proposed factor model is identified up to
a rotation, so sparsity of factor loadings does not necessarily mean lack of systematic content
for these factors.
10 20 30 40 50 60 70 80-0.5
0
0.5
1
1.5
fact
or lo
adin
gs
0
0.1
0.2
0.3
0.4
stan
dard
dev
iatio
n
Pos. meanstd.
10 20 30 40 50 60 70 80-2
0
2
fact
or lo
adin
gs
0
0.2
0.4
stan
dard
dev
iatio
n
10 20 30 40 50 60 70 80-1
-0.5
0
0.5
1
1.5
fact
or lo
adin
gs
0
0.1
0.2
0.3
0.4
0.5
stan
dard
dev
iatio
n
10 20 30 40 50 60 70 80
assets
-1
0
1
fact
or lo
adin
gs
0
0.1
0.2
stan
dard
dev
iatio
n
Figure 4: Posterior estimate of the factor loadings Λ. From top to bottom are the posterior mean
(bar) and standard deviation (dot) of the loadings on the 1-st, 2-nd, 3-rd and 4-th factors. Due to identification
restriction, the upper diagonal block of Λ is fixed.
Table 4 summarises the sample mean and standard deviation of posterior mean estimates
of autoregressive coefficient φ, volatility of volatility σ, unconditional mean µ and the d.o.f.
32
Table 4: Summary of Posterior Estimate
parsMean and standard deviation of posterior
mean s.t.d. C.I. lb C.I. ub skewness
φ 0.971 (0.033) 0.007 (0.004) 0.962 (0.037) 0.977 (0.033) -0.536 (0.279)σ 0.197 (0.095) 0.019 (0.010) 0.184 (0.082) 0.209 (0.091) 0.082 (0.017)µ -7.578 (0.454) 0.328 (0.109) -7.670 (0.468) -7.308 (0.449) -0.404 (0.396)ζ 28.885 (1.603) 5.436 (0.371) 24.265 (1.489) 31.434 (1.708) 0.409 (0.169)
The table shows the mean of 84 posterior means, standard deviations, the lower (lb) and upper bounds (ub) of 95%
credible interval, and skewness obtained from the MCMC samples after burn-in. In the bracket is the associated
standard deviation among 84 series of one posterior statistics.
parameter ζ among the 84 series consisting of 4 factor processes and 80 individual asset-specific
processes. Mean of φ is 0.97 with quite small standard deviation, which indicates most of SV
series of ht and lt are quite persistent. We find that only 5 assets with φli smaller than 0.9,
and 21 assets smaller than 0.95. The value of mean volatility of volatility σ is in line with
many other researches on univariate SV model. There is however one asset with a near-zero σ.
This means that four factors together account for most of its first and second order variation.
The sample skewness of φ, σ and ζ is most likely due to the parameter transformation applied,
however the sample mean of the skewness of µ is found to result from five very left-skewed µ’s,
of which four are asset-specific and one is from h4,t, the SV of f4,t. This can also be seen from
the large standard deviation of the skewness estimates, suggesting the estimation procedure
produces quite different posterior distributions of p(µ|·) among the 84 series.
Table 5 reports some statistics summarising the inefficiency factors obtained using the pro-
posed EIS-PGAS algorithm including minimum, maximum and interquartile range (IQR). For
parameters pertaining to the 84 SV models, inefficiency factors are very comparable to the one
obtained in the simulation study of the univariate model. This supports our previous claim that
once the multivariate model is split into n+ p individual univariate models, EIS-PGAS is able
to produce an MCMC sample almost as efficiently as in the case of a univariate model. For
such a complex model structure with more than 800 parameters to estimate, that EIS-PGAS
delivers smaller than 20 IE(φ) and IE(µ) in almost all 84 individual Markov chains suggests
our method is highly efficient. Also, there is only one IE(σ) larger than 50 and the tight IQR
also suggests the fast decay of the autocovariance of the Markov chain. IE(ζ) tends to be larger
than the previous three parameters, similar to the case of the univariate model. The last four
columns give the inefficiency factor of loadings on the four factors. Remember that the sampling
33
Table 5: Summary of Inefficiency Factor
statisticsParametersφ σ µ ζ Λ1 Λ2 Λ3 Λ4
medium 11.32 27.79 6.35 67.24 14.86 45.24 33.64 69.70min 5.67 22.37 3.77 32.85 8.94 25.53 21.71 42.57max 21.27 52.46 13.38 94.21 19.63 87.04 50.38 108.67IQR 13.68 30.56 8.10 47.16 6.34 51.90 18.66 57.83
Based on 84 estimates for each parameter, the table shows the summary of inefficiency factors delivered by the pro-
posed EIS-PGAS algorithm. Λj stands for the loadings on the j-th factor. IQR is the interquartile range.
scheme for Λ we use is a standard MH algorithm based on Laplace approximation, but with the
help of marginalisation of factors, factor loadings can be sampled efficiently, a similar result of
Chib et al. (2006). Compared to IE(Λ2) and IE(Λ4), the tighter IQR of IE(Λ1) and IE(Λ3)
may be caused by the many-near-zero loadings on the 2-nd and 4-th factor as seen in Figure 4.
Figure 5 illustrates the posterior shrinkage estimate of the leverage effect ρ and skewness
parameter β, sorted in ascending order. The left and middle panel show that both leverage effect
and asymmetry carry some systematic content. Although ρf2 and ρf3 are between −0.1 and 0,
ρf1 and ρf4 are clearly non-zero and negative, shared by all assets. Asymmetry from the second
factor contributes the most to the observed return asymmetry for individual assets. Comparing
the shrinkage estimate of β with ρ, we can see that apart from the factor leverage effect, many
return series still possess some amount of asset-specific leverage effect, which can be told by the
right panel showing that many individual assets enjoy smaller than 0.8 posterior probability of
zero leverage. This is different from the case of asymmetry where βf accounts for almost all
asymmetry in asset returns. Consequently, the phenomenon of systematic asymmetry of asset
returns may imply risk premium imposed on the third moment of the “market portfolio”.
The lefe graph of Figure 6 illustrates the posterior mean estimate of the SV of f1,t and
f4,t, i.e. the estimate of exp(h1,t/2) and exp(h4,t/2). The volatility of the f4,t is extremely
high in 2008, suggesting its role as a crisis factor. The middle graph shows the estimate of the
time-varying volatility of three chosen asset returns, which is computed via
σi,t =( 4∑j=1
Λ2ij exp(hj,t) + exp(li,t)
) 12 ,
where i ∈ 1, ..., 80 is the index for the asset. That the three volatility series behave very differently
highlights the room for modelling asset-specific SV process on top of factor SV which we believe
34
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
leve
rag
e e
ffe
ct ;
Pos. mean1st factor2nd factor3rd factor4th factor
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
ske
wn
ess
-
Pos. mean1st factor2nd factor3rd factor4th factor
0 0.2 0.4 0.6 0.8 1
P(;|.)=0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P(-
|.)=
0
individual asset1st factor2nd factor3rd factor4th factor
Figure 5: Sorted posterior estimate of ρ and β. Left / Middle: posterior mean estimate of leverage effect
ρ / skewness parameter β with 95% credible interval; Right: posterior zero probability of β against that of ρ.
Coloured dots indicate the parameters corresponding to four factors.
has profound consequence in covariance matrix forecasting and risk evaluation. Furthermore, it
is known that asset returns tend to co-move during periods of market turmoil, and we can see
this in the right graph. It shows the implied time-varying correlations among the three chosen
asset returns, which are calculated as
Corrij,t =
∑4k=1 ΛikΛjk exp(hk,t)
σi,tσj,t.
In the year of financial crisis the three correlation series start climbing up, one of which even
shoots up to over 0.4. Yet we have to notice that outside the crisis period the correlation can
be low in absolute value, and correlations have different magnitude and volatility. The latter
indicates that if one uses models with equicorrelation, portfolio management may not be optimal
in terms of diversification.
01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/170
0.05
0.1
0.15
0.2
0.25
1st factor4st factor
01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/170
0.05
0.1
0.15
The Dow Chemical CompanyWalgreens Boots Alliance, Inc.PayPal Holdings
01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/17-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
The Dow Chemical vs. Walgreens BootsThe Dow Chemical vs. PayPalWalgreens Boots vs. PayPal
Figure 6: Posterior mean of factor and stochastic volatility process. Left: SV exp(hj,t/2) from the
j = 1, 4-th factors; Middle: Volatility σi,t of three chosen asset returns; Right: Implied time-varying correlations
Corrij,t among the three asset returns.
35
The extracted factors are model-based, so one may be interested in their relationship with
market indicators such as the classic Fama-French factors (Fama and French, 1993). To examine
this, we run simple linear regression of the filtered estimate of four model-based factors on each
one of the three Fama-French factors, i.e. Rm-Rf, SMB, and HML during the same sample
period, and their t-statistics are shown in Figure 7. From the figure, it can be seen that the
variation of Rm-Rf, SMB and HML is respectively explained by the 2-nd, the 1-st and the 3-rd
factor. There is however no statistical evidence that Fama-French factor is jointly explained by
multiple model-based factors6. Based on this, we conjecture that each extracted factor contains
unique market information and measures different systematic movement from the factors con-
structed ad hoc by Fama and French. This exercise can be extended to other market factors.
For example, the momentum factor in the four-factor model of Carhart (1997) which introduces
an extra index describing the stock price’s tenaciousness of moving in one direction captures
systematic content outside the Fama-French three factors.
t sta
tistic
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Figure 7: Explanatory content of four factors on the three factors of Fama and French (1993).For the left to the right are the t-statistics of regression of posterior mean of f ′t|1:t−1 for all t on each one of the
three Fama-French factors: Rm-Rf, SMB, and HML. Red indicates significant effect.
Table 6 shows the Bayes factor calculated via IS2 marginal likelihood for model specifications
with different number of factors. The number of factors under consideration are between 2 and
6, in line with other literature. Model with 4 factors is preferred over all other specifications, in
particular over model with 5 and 6 factors, which is the choice made by the IC p3 of Bai and
Ng (2002). Also the IC p1 delivers almost equal values for specification with 5 and 6 factors.
Via the use of IS2 for calculating the marginal likelihood, we can safely choose a model with 4
factors. Other comparisons show that the model with 3 factors is slightly preferred over 6-factor
model, and evidently preferred over the model with 5 factors.
6Notice that the model-based factors are identified up to a rotation, but joint significance is not affected bysuch rotations
36
Table 6: Number of Factor Based on Marginal Likelihood
Jeffrey’s scaleNumber of factors
4/2 4/3 4/5 4/6 3/5 3/6 6/5
1-3.2 – – – – –√
–3.2-10 – – – –
√–
√
10-100√ √
– – – – –> 100 – –
√ √– – –
Shaded cell indicates the Bayes factor using IS2 marginal likelihood for one choice of number of factors against an-
other falls into a certain category given by the Jeffrey’s scale.
5.3 Dynamic portfolio management
To see how our proposed model and estimation methods may work in practice, we compare
another five models in terms of VaR and portfolio performance under both one-week and two-
week rebalancing dynamics. Two portfolios are considered: 1). a U.S. portfolio which is the
dataset used in the previous application, i.e. 80 equity return series from components of S&P
100 index; 2). an Australian portfolio containing 41 return series from components of S&P ASX
50 index. A rolling window exercise of size T = 600 is carried out with S = 495 out-of-sample
trading weeks. Throughout the following, our model is abbreviated by HFSV.
5.3.1 Design and alternative models
We choose competing models which also adapt a factor structure, which in fact is usually con-
sidered a viable modelling framework when high-dimensional dataset is of interest7.
The first model is the multivariate stochastic volatility model (MSV) of Chib et al. (2006).
The second model is the same model but augmented with stochastic jumps (MSV-J). MSV-J is
formulated as
yt = Λft +Ktqt + ut,
where factor fj,t is Gaussian with standard stochastic volatility , i.e. no leverage effect or
asymmetry. The idiosyncratic term ui,t is Student’s-t error with standard stochastic volatility.
Kt is a diagonal matrix recording jump size at time t. qi,t is a Bernoulli random variable. MSV
does not have the jump term. We replace the estimation method for stochastic volatility given
in the original paper by our EIS-PGAS algorithm.
7Comparisons with popular models such as the BEKK, DCC, CCC, DECO, VGARCH model, and variants ofthem are left out, because these models do not have a factor structure. Though comparisons with them are stillinteresting, readers may refer to papers of our chosen competing models and references therein.
37
The third model, denoted by CKL, is the factor model of Chan et al. (1999) which writes
yt = Λft + ut
Ωt = ΛVtΛ′ + Ut,
where ft is a vector of constructed (thus observed) factors, which in this exercise contains the
three Fama-French factors8, i.e. ft = ((Rm-Rf)t, SMBt,HMLt)′. The covariance matrix Vt is
computed by rolling-windows, i.e. Vt1L
∑t−1l=t−L flf
′l , and Ut is the sample covariance matrix of
residuals from asset-by-asset regressions which find all rows of Λ.
The fourth model is the dynamic factor multivariate GARCH (DFMG) model of Santos and
Moura (2014). The model also uses constructed factors but is more flexible, and given by
yt = Λtft + ut
Ωt = ΛtVtΛ′t + Ut,
λk,t+1 = λk,t + ηt,
where λk,t is the k-th element of vec(Λt), k = 1, ..., p×n, which follows a random walk. Vt and Ut
are diagonal matrix with each element evolving according to a standard GARCH dynamics. fi,t
and ut are assumed to be Gaussian and Student’s-t, in line with the MSV and MSV-J model. To
estimate the model, one firstly estimate the GARCH dynamics for ft and obtain Vt. Secondly,
if treating constant, Λ can be obtained by OLS. The residuals are then passed into a GARCH-t
filter, delivering Ut. Thirdly, given Vt and Ut for t = 1, ...T , vec(Λt) is obtained via Kalman
filter, and the covariance matrix of ηt is estimated by quasi -maximum likelihood.
The last model we consider is the factor copula (FCO) model of Oh and Patton (2017). This
model provides a novel way of modelling high-dimensional dependence structure and allows for
enough flexibility. Computation complexity for this model is comparable to a factor GARCH
model such as DFMG. For ease of exposition, we leave out the model specification and forecasting
procedure for FCO. Readers may refer to the original paper. Also, we choose GARCH marginals
and a Gaussian factor model for simplicity, which implies a Gaussian conditional copula.
We consider a basic dynamic minimum-variance portfolio (MVP) problem. The MVP is
dynamic because rebalancing is allowed, and the rebalancing decision is based on filtered estimate
of the portfolio’s conditional covariance matrix. The MVP determines the n-by-1 portfolio
8The three factors for the U.S. portfolio is readily found online. We construct those for the Australian portfoliobased on definitions in Fama and French (1993).
38
weights ωt+h|t at time t to rebalance at time t+ h such that
ωt+h|t = arg minωω′Ωt+h|tω, subject to ω′ = 1,
where is a vector of ones. The solution of this MVP problem is given by
ωt+h|t =Ω−1t+h|t
′Ω−1t+h|t
.
For HFSV, MSV and MSV-J model, Ωt+h|t is obtained via methods in Section 3.2 9. For CKL
model, Ωt+h|t is simply set to be equal to Ωt. For DFMG and FCO model, it is straightforward
to use a GARCH-like recursive algorithm to compute Ωt+h|t.
5.3.2 VaR and performance
One important task in portfolio management is the determination of portfolio VaR at time t+ 1
given information up to time t, or V aRp,t+1|t. Provided the portfolio weights ωt+1|t solved from
the MVP problem, the one-step ahead VaR at α% level is given by
V aRp,t+1|t(α) =√ω′t+1|tΩt+1|tωt+1|tF
−1yp,t+1|t
(α),
where F−1yp,t+1|t
(α) is the α-th percentile of the distribution function of the one-step ahead pre-
dicted portfolio return yp,t+1|t = ω′t+1|tyt+1|1:t. For HFSV, MSV and MSV-J model, the distri-
bution function of yt+1|1:t can be readily estimated based on the particle system at time t as
in equation (28). For other models, the conditional forecasting density can derived straightfor-
wardly similar to GARCH type of models.
The unconditional and conditional coverage ratio test used by Chib et al. (2006) is applied
to investigate the quality of VaR estimates. We define the following binary sequence It as
It =
1 if ω′t+1|tyt+1 < V aRp,t+1|t
0 if ω′t+1|tyt+1 ≥ V aRp,t+1|t
.
It = 1 means hits or exceptions. Well behaved VaR estimates means the sequence It should
have correct unconditional coverage ratio, i.e. E(It) = α. A likelihood ratio (LR) test based
on the hit rate (HR) 1T
∑Tt=1 It can be constructed for the unconditional coverage. According
9The sampler for MSV-J has one extra step to draw Kt and qt. See Chib et al. (2006) for details.
39
Table 7: Quality of VaR Estimates for The U.S. Portfolio
S&P 100α=0.01 α=0.05
HR LRuc LRind LRcc HR LRuc LRind LRcc
HFSV 0.011 0.71 0.75 0.89 0.051 0.86 0.20 0.43MSV 0.017 0.13 0.68 0.30 0.046 0.71 0.46 0.71MSV-J 0.012 0.69 0.37 0.62 0.051 0.86 0.87 0.97CKL 0.019 0.07 0.27 0.10 0.084 0.00 0.07 0.00DFMG 0.023 0.01 0.62 0.02 0.121 0.00 0.12 0.00FCO 0.009 0.70 0.23 0.45 0.058 0.36 0.34 0.42
The table shows p-values of coverage ratio tests for the US portfolio. Portfolio weights are updated weekly based
on one-step ahead forecast of covariance matrix. α is the nominal level of VaR. Shaded cells indicate rejection of
coverage ratio test at 10% level.
Table 8: Quality of VaR Estimates for The Australian Portfolio
ASX 50α=0.01 α=0.05
HR LRuc LRind LRcc HR LRuc LRind LRcc
HFSV 0.014 0.44 0.24 0.37 0.051 0.86 0.37 0.66MSV 0.015 0.25 0.51 0.42 0.054 0.72 0.46 0.71MSV-J 0.012 0.69 0.36 0.61 0.048 0.85 0.21 0.45CKL 0.014 0.44 0.48 0.58 0.036 0.12 0.08 0.06DFMG 0.018 0.07 0.09 0.04 0.079 0.00 0.12 0.00FCO 0.009 0.73 0.11 0.26 0.053 0.71 0.20 0.41
The table shows p-values of coverage ratio tests for the Australian portfolio. Also see descriptions of Table 7.
to Christoffersen (1998), for dynamic models conditional coverage ratio is more relevant which
depends on serial independence of It. The test statistic LRuc for testing unconditional coverage
and LRind for serial independence can be constructed using It and they are both asymptotically
χ2(1)-distributed. The combined statistic LRcc = LRuc+LRind for testing conditional coverage
is asymptotically χ2(2)-distributed.
Table 7 and 8 report the p-values of LR tests for the U.S. and Australian portfolio with shaded
cells indicating rejection at 10% level. Comparing HR’s given by different models, HFSV is the
most accurate in estimating VaR, except for the case of Australian portfolio targeting VaR at
1% nominal level. MSV is always less accurate than MSV-J, highlighting the need for modelling
“jumps” or “shocks”. Though MSV incorporates Student’s t-distributed idiosyncratic errors, its
performance implies insufficiency in only modelling asset specific “shocks”. FCO also estimates
VaR well, perhaps with the exception of the U.S. portfolio targeting 5% nominal level, though
test results do not reject its validity.
40
Interestingly, all shaded cells come from either CKL or DFMG, both using constructed
factors. We conjecture this has to do with rebalancing. Because one updates the portfolio
weights based on covariance matrix forecast, on which the estimation of VaR critically depends,
constructed factors are proxies that may not adequately reveal the unobserved factor structure.
As a result, the forecast gets contaminated when a certain proportion of assets deviates from
factors. Additionally, HFSV is the only model taking into account asymmetry and leverage
effect, which are believed to influence HR.
Besides risk management, portfolio performance is also evaluated based on Sharpe ratio (SR)
and information ratio (IR). SR measures the risk-adjusted return per unit of portfolio return
variability. A portfolio that is rebalanced on a h-week basis has
SR(h) =µ(h)
σ(h), where µ(h) =
1
S − h
S−h∑s=1
ω′T+s+h|T+syT+s+h,
σ2(h) =1
S − h
S−h∑s=1
(ω′T+s+h|T+syT+s+h − µ(h)
)2.
IR is often used to set portfolio constraints for managers such as tracking risk limits. It measures
how much excess return can be generated from the amount of excess risk relative to a chosen
benchmark. Here we choose S&P 100 and ASX 50 index return as benchmark for the U.S. and
Australian portfolio. The IR is given by
IR(h) =µ(h)
σ(h), where µ(h) =
1
S − h
S−h∑s=1
ω′T+s+h|T+s(yT+s+h − µB,T+s+h),
σ2(h) =1
S − h
S−h∑s=1
(ω′T+s+h|T+s(yT+s+h − µB,T+s+h)− µ(h)
)2,
where µB,t is the benchmark return at time t.
Model comparisons are carried out in terms of portfolio average weekly return, variance,
SR and IR for the out-of-sample period considered. Table 9 shows that for the U.S. portfolio
the equally weighted portfolio gives the highest variance and the lowest mean return. This
would suggest that the equally weighted portfolio is inefficiently managed and locates inside
the conditional efficient frontier implied by different models, and in the bottom area of the
conditional feasible set10. This is in contrast to the Australian portfolio summarised in Table
10. Among all models, HSFV delivers the lowest portfolio return variance, and it is the only
10The efficient frontier and feasible set are conditional because the mean and covariance matrix of asset returnsat time t+ h is determined conditional on information up to time t.
41
Table 9: The U.S. Minimum Variance Portfolio Performance
S&P 100h=1 h=2
µ× 102 σ2 × 104 SR IR µ× 102 σ2 × 104 SR IR
EquWgt 0.150 7.385 0.055 0.041 0.152 7.398 0.056 0.043HFSV 0.205 2.301 0.135 0.033 0.188 2.241 0.126 0.046MSV 0.187 2.462 0.119 0.036 0.174 2.520 0.110 0.051MSV-J 0.183 2.209 0.123 0.052 0.186 2.534 0.117 0.032CKL 0.146 4.671 0.068 0.020 0.160 4.215 0.078 0.016DFMG 0.155 5.205 0.068 0.037 0.175 4.813 0.080 0.022FCO 0.174 2.911 0.102 0.033 0.203 3.116 0.115 0.040
The table shows the average weekly MVP returns and variances for the U.S. portfolio under different models. EquWgt
denotes a equal weighted portfolio. SR and IR are also reported where the latter is relative to S&P 100 index re-
turn. One- and two-week rebalancing policies are considered. Shaded cells indicate the best performer with lowest
variance, highest mean, highest SR, or highest IR.
Table 10: The Australian Minimum Variance Portfolio Performance
ASX 50h=1 h=2
µ× 102 σ2 × 104 SR IR µ× 102 σ2 × 104 SR IR
EquWgt -0.128 6.077 -0.052 -0.162 -0.130 6.084 -0.053 -0.161HFSV -0.086 2.164 -0.063 0.119 -0.106 2.043 -0.057 0.126MSV -0.147 2.283 -0.097 0.124 -0.134 2.764 -0.081 0.101MSV-J -0.132 2.280 -0.087 0.094 -0.177 2.655 -0.109 0.025CKL -0.115 4.455 -0.055 -0.174 -0.127 4.860 -0.058 -0.153DFMG -0.190 3.037 -0.109 -0.099 -0.194 3.675 -0.101 -0.231FCO -0.180 2.634 -0.111 0.028 -0.146 2.648 -0.090 0.062
The table shows the MV portfolio performance for the Australian portfolio. Also see descriptions of Table 9.
model achieving mean return higher than the equally weighted portfolio under both weekly and
biweekly rebalancing. It means that for other models, the equally weighted portfolio locates in
the upper half of their conditional feasible sets. For the U.S. portfolio rebalanced weekly, HFSV
delivers the second lowest variance, slightly higher than MSV-J. This is due to the jumps in
MSV-J explaining larger variations by stochastic jumps, though the variance under a biweekly
rebalancing policy becomes bigger than HFSV. Another observation is that the return variances
clearly fall in two groups. The first includes HFSV, MSV, MSV-J and FCO, whose factors are
model- and data-based. The second, showing larger variance, includes CKL and DFMG, which
uses constructed factors. This indicates that the conditional efficient frontier implied by the first
group of models lies to the left of that implied by the second group.
Importantly, HFSV delivers the highest SR for the U.S. portfolio under both rebalancing
42
policies, with the main competitor MSV-J. Although the U.S. portfolio managed using HFSV
compensates investors the most for the risk taken, the Australian portfolio rebalanced biweekly
suggests the superior performance of HFSV in relation to the risks investors choose to deviate
from the benchmark, i.e. high IR. Yet for the U.S. portfolio, MSV and MSV-J model give
the highest IR, followed by HFSV. FCO produces moderately-performing SR, but its deviation
from the benchmark fluctuates more, making its IR lower than other models with unobserved
factors. One should notice that because the choice of benchmark is subjective and influences the
calculation of IR, a low IR should not be seen as decisive evidence of poor model performance.
A final remark is that the SR and the IR are low because we only consider MVP. This is to say
that investors are assumed to have infinitely large risk-aversion. Should a certain degree of risk
is allowed and a certain return is required, both can increase.
6 Conclusion
We propose a high-dimensional factor stochastic volatility model with leverage effect using the
generalised hyperbolic skew Student’s t-error to address asymmetry and heavy tails of equity re-
turns. The model is shown to be flexible enough to distinguish asset-specific mean and volatility
dynamics from common factors. With shrinkage technique, the model helps answer the question
whether leverage effect and return asymmetry are systematic or idiosyncratic. A highly efficient
Bayesian estimation procedure to sample hyperparameters and unobserved volatility processes
is developed and we show that based on marginalisation of factors, factor loadings can be sam-
pled efficiently leading to a set of individual stochastic volatility models where particle efficient
importance sampling and refined particle Gibbs with ancestor sampling can be used. Addition-
ally, importance sampling squared accurately calculates marginal likelihood to determine the
number of factors. Our detailed Monte Carlo study on both univariate and multivariate models
provides evidence on the successful implementation of the proposed model and method. We
apply our model to a U.S. dataset with 80 assets, and find that large proportion of return asym-
metry comes from factors, indicating a co-skewness systematic phenomenon. Lastly, minimum
variance portfolio exercises for the U.S. portfolio and another Australian portfolio show that
estimation of VaR is very accurate using our proposed model. Under both weekly and biweekly
rebalancing policies, the model outperforms other factor models.
43
References
Aas, K. and Haff, I. H. (2006). The generalized hyperbolic skew Student’s t-distribution. Journalof Financial Econometrics, 4(2):275–309.
Aguilar, O. and West, M. (2000). Bayesian dynamic factor models and portfolio allocation.Journal of Business & Economic Statistics, 18(3):338–357.
Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.
Asai, M., McAleer, M., and Yu, J. (2006). Multivariate stochastic volatility: a review. Econo-metric Reviews, 25(2-3):145–175.
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica, 70(1):191–221.
Bickel, P., Li, B., Bengtsson, T., et al. (2008). Sharp failure rates for the bootstrap particlefilter in high dimensions. In Pushing the limits of contemporary statistics: Contributions inhonor of Jayanta K. Ghosh, pages 318–329. Institute of Mathematical Statistics.
Bollerslev, T. (1987). A conditionally heteroskedastic time series model for speculative pricesand rates of return. The review of economics and statistics, pages 542–547.
Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchange rates: a multi-variate generalized arch model. The review of economics and statistics, pages 498–505.
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Arch models. Handbook of econometrics,4:2959–3038.
Carhart, M. M. (1997). On persistence in mutual fund performance. The Journal of finance,52(1):57–82.
Carrasco, M. and Chen, X. (2002). Mixing and moment properties of various garch and stochasticvolatility models. Econometric Theory, 18(01):17–39.
Chan, J. C., Leon-Gonzales, R., and Strachan, R. W. (2013). Invariant inference and efficientcomputation in the static factor model.
Chan, L. K., Karceski, J., and Lakonishok, J. (1999). On portfolio optimization: Forecastingcovariances and choosing the risk model. Review of Financial Studies, 12(5):937–974.
Chib, S. (2001). Markov chain Monte Carlo methods: computation and inference. Handbook ofeconometrics, 5:3569–3649.
Chib, S. and Greenberg, E. (1994). Bayes inference in regression models with ARMA (p, q)errors. Journal of Econometrics, 64(1-2):183–206.
Chib, S., Nardari, F., and Shephard, N. (2006). Analysis of high dimensional multivariatestochastic volatility models. Journal of Econometrics, 134(2):341–371.
Chib, S., Omori, Y., and Asai, M. (2009). Multivariate stochastic volatility. In Handbook ofFinancial Time Series, pages 365–400. Springer.
Chopin, N., Singh, S. S., et al. (2013). On the particle Gibbs sampler. CREST.
Christoffersen, P. F. (1998). Evaluating interval forecasts. International economic review, pages841–862.
Clyde, M. and George, E. I. (2004). Model uncertainty. Statistical science, pages 81–94.
Creal, D., Koopman, S. J., and Lucas, A. (2012). A dynamic multivariate heavy-tailed modelfor time-varying volatilities and correlations. Journal of Business & Economic Statistics.
Danielsson, J. (1998). Multivariate stochastic volatility models: estimation and a comparisonwith VGARCH models. Journal of Empirical Finance, 5(2):155–173.
De Jong, P. and Shephard, N. (1995). The simulation smoother for time series models.Biometrika, 82(2):339–350.
Del Moral, P. and Formulae, F.-K. (2004). Genealogical and interacting particle systems withapplications, Probability and Its Applications.
Dempster, M. A. H., Pflug, G., and Mitra, G. (2008). Quantitative Fund Management. Chapmanand Hall/CRC.
44
Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlomethods. In Sequential Monte Carlo methods in practice, pages 3–14. Springer.
Doz, C., Giannone, D., and Reichlin, L. (2011). A two-step estimator for large approximatedynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1):188–205.
Durbin, J. and Koopman, S. J. (1997). Monte carlo maximum likelihood estimation for non-gaussian state space models. Biometrika, 84(3):669–684.
Durbin, J. and Koopman, S. J. (2000). Time series analysis of non-Gaussian observations basedon state space models from both classical and Bayesian perspectives. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 62(1):3–56.
Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. Number 38.Oxford University Press.
Engle, R. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models. Journal of Business & Economic Statis-tics, 20(3):339–350.
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds.Journal of financial economics, 33(1):3–56.
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2012). The generalized dynamic factor model.Journal of the American Statistical Association.
French, K. R., Schwert, G. W., and Stambaugh, R. F. (1987). Expected stock returns andvolatility. Journal of financial Economics, 19(1):3–29.
Geweke, J. and Tanizaki, H. (2001). Bayesian estimation of state-space models using theMetropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Anal-ysis, 37(2):151–170.
Gilks, W. R., Best, N., and Tan, K. (1995). Adaptive rejection Metropolis sampling withinGibbs sampling. Applied Statistics, pages 455–472.
Jacquier, E., Polson, N. G., and Rossi, P. E. (2004). Bayesian analysis of stochastic volatilitymodels with fat-tails and correlated errors. Journal of Econometrics, 122(1):185–212.
Jung, R. C. and Liesenfeld, R. (2001). Estimating time series models for count data usingefficient importance sampling. AStA Advances in Statistical Analysis, 4(85):387–407.
Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: likelihood inference andcomparison with ARCH models. The Review of Economic Studies, 65(3):361–393.
Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian econometric methods. CambridgeUniversity Press.
Koopman, S. J. and Hol Uspensky, E. (2002). The stochastic volatility in mean model: empiricalevidence from international stock markets. Journal of applied Econometrics, 17(6):667–689.
Liesenfeld, R. and Richard, J.-F. (2006). Classical and bayesian analysis of univariate andmultivariate stochastic volatility models. Econometric Reviews, 25(2-3):335–360.
Lindsten, F., Jordan, M. I., and Schon, T. B. (2014). Particle gibbs with ancestor sampling.Journal of Machine Learning Research, 15(1):2145–2184.
Nakajima, J. (2015). Bayesian analysis of multivariate stochastic volatility with skew returndistribution. Econometric Reviews, pages 1–23.
Nakajima, J. and Omori, Y. (2012). Stochastic volatility model with leverage and asymmetricallyheavy-tailed error using GH skew Student’s t-distribution. Computational Statistics & DataAnalysis, 56(11):3690–3704.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econo-metrica: Journal of the Econometric Society, pages 347–370.
Oh, D. H. and Patton, A. J. (2017). Modeling dependence in high dimensions with factorcopulas. Journal of Business & Economic Statistics, 35(1):139–154.
Olsson, J. and Ryden, T. (2011). Rao-blackwellization of particle markov chain monte carlomethods using forward filtering backward sampling. IEEE Transactions on Signal Processing,59(10):4606–4619.
Pitt, M. and Shephard, N. (1999b). Time varying covariances: a factor stochastic volatility
45
approach. Bayesian statistics, 6:547–570.
Pitt, M. K. and Shephard, N. (1999a). Filtering via simulation: Auxiliary particle filters. Journalof the American statistical association, 94(446):590–599.
Richard, J.-F. and Zhang, W. (2007). Efficient high-dimensional importance sampling. Journalof Econometrics, 141(2):1385–1411.
Ruiz, E. (1994). Quasi-maximum likelihood estimation of stochastic volatility models. Journalof econometrics, 63(1):289–306.
Santos, A. A. and Moura, G. V. (2014). Dynamic factor multivariate garch model. ComputationalStatistics & Data Analysis, 76:606–617.
Scharth, M. and Kohn, R. (2016). Particle efficient importance sampling. Journal of Economet-rics, 190(1):133–147.
Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-gaussian measurement timeseries. Biometrika, 84(3):653–667.
Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. (2008). Obstacles to high-dimensionalparticle filtering. Monthly Weather Review, 136(12):4629–4640.
Takahashi, M., Omori, Y., and Watanabe, T. (2009). Estimating stochastic volatility modelsusing daily returns and realized volatility simultaneously. Computational Statistics & DataAnalysis, 53(6):2404–2426.
Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2014). Importance sampling squared forbayesian inference in latent variable models. Available at SSRN 2386371.
Vardi, N. (2015). Top quant hedge funds stand out with good 2015.
Watanabe, T. and Omori, Y. (2004). A multi-move sampler for estimating non-gaussian timeseries models: Comments on shephard & pitt (1997). Biometrika, pages 246–248.
Wright, S. and Nocedal, J. (1999). Numerical optimization. Springer Science, 35:67–68.
Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics, 127(2):165–178.
46
Appendices
A The leverage effect multiplier
The leverage effect for the univariate SV model (1) is Corr(νt, ηt) = Cov(νt, ηt)/√
Var(νt)Var(ηt),
where the numerator
Cov(νt, ηt) = E(√Wt)ρσ.
Since Wt ∼ IG( ζ2 ,ζ2), 1
ζWt is IG( ζ2 ,12)-distributed or Inv − χ2(ζ) distributed. Let Wt =
√Wt,
we have 1ζ W
2t ∼ Inv − χ2(ζ) with Jacobian 2
ζ Wt. It follows that
E(Wt) =
∫ ∞0
2
ζW 2t
2−ζ/2
Γ(ζ/2)ζ
2ζ+2 W
−(ζ+2)t exp(
−ζ2W 2
t
)dWt
=1√ζWt
∫ ∞0
1
2ζ/2−1Γ(ζ/2)
(1√ζWt
)−ζ−1
exp(−1
2
1√ζWt)
−2dWt
=
√ζ
2ζ/2−1Γ(ζ/2)
∫ ∞0
y−ζ exp(1
−2y2)dy
=
√ζ
2ζ/2−1Γ(ζ/2)
∫ ∞0
2ζ/2−3/2zζ/2−1 exp(−z)dz
=
√ζΓ((ζ − 1)/2
)√
2Γ(ζ/2),
where we use substitution y ≡ 1√ζWt and z ≡ 1
2y−2. In the denominator, the variance of the
generalised hyperbolic skew Student’s t-distributed error νt is given by Aas and Haff (2006) (in
their parametrisation δ2 and v are both equivalent to our ζ), i.e.
Var(νt) =2β2ζ2
(ζ − 2)2(ζ − 4)+
ζ
ζ − 2.
With these quantities, the unconditional leverage effect multiplier can be shown to be to one
given in Section 2.1.
B Discussion on two theorems
Theorem 1 can be proved following the arguments in Lindsten et al. (2014) who show the
invariance between PGAS with a bootstrap filter and the one without AS. It might be of interest
to establish the equivalence between EIS-PGAS and EIS-PG (which is also an alternative proof).
One can notice that they are not the same as the bootstrap filter where the importance density
47
is simply p(xt|xt−1) and thus the sampling weights are proportional to p(yt|xit), i.e. independent
of ancestor trajectories. Using the notations of Section 2.2 and letting LMθ (x∗1:T , B) for M ≥ 0
denote the Markov kernel implied by EIS-PG on (X∗1:T ,F1:T ), we propose the following
Proposition 1. Suppose EIS is the importance density used in particle filtering for PGAS and
PG, i.e. q(xt|xt−1, y1:T ) as in (8). For any x∗1:T ∈ X∗1:T ,
KMθ (x∗1:T , B) = LMθ (x∗1:T , B), ∀B ∈ F1:T .
To proceed, firstly suppose the final product of EIS-PGAS is the k-th chosen sample xk1:T .
The kernel is then given by
KMθ (x∗1:T , B) = Eθ,x∗1:T1B(xk1:T ) = Eδ1:T1B(xk1:T ).
The expectation is with respect to all random number generated in the algorithm, i.e. x1:T , a2:T , k ∈
R2T × N2(T−1)+1/0 . The last equation comes from the fact that their distribution function is de-
fined by the EIS importance density parameter vector δt = bt, ct, st, rt11 as in (6), which is
identical for both samplers.
Following Lindsten et al. (2014), the ancestor index can be recursively written by αt = aαt+1
t+1
going backward from αT = k, and without loss of generality we take the measurable rectangle
set B = ΠTt=1Bt and for all t = 1, ..., T , Bt ∈ Ft where Ft is the natural filtration, i.e. B is a
π-system generating F1:T . So we can write these two kernels as
KMθ (x∗1:T , B) = E
(T∏t=1
1Bt(xαtt )∣∣δt) , and LMθ (x∗1:T , B) = E
(T∏t=1
1Bt(xβtt )∣∣δt) .
It suffices to show that for all bounded and multiplicative functionals f(x1:T ) =∏Tt=1 ft(xt)
we have Eδ1:T(f(xα1:T
1:T ))
= Eδ1:T(f(xβ1:T1:T )
). Because EIS-PG sampler is essentially a backward
simulator running forward. This can be established via backward induction according to Olsson
and Ryden (2011) and Lindsten et al. (2014). Suppose this holds for t < T and s > t, i.e.
E
(T∏
s=t+1
fs(xαss )∣∣δs) = E
(T∏
s=t+1
fs(xβss )∣∣δs) .
The induction hypothesis can be shown to hold following the equivalence between a backward
11And they are determined by the previous draw in the MCMC run, i.e. the reference trajectory x∗1:T , andother hyperparameters θ.
48
simulator and a bootstrap filter in Olsson and Ryden (2011). To see this, remember that both
EIS-PG and EIS-PGAS choose χq(xT ; δT+1) = 1. Since this choice is arbitrary, we can make δT
contain all zeros. As a result, kq(xT , xT−1; δT ) = p(xT |xT−1, yT−1); namely this choice also makes
ωiT for i = 1, ...,M + 1 proportional to p(yT |xT ). So αT and βT are equally distributed. Using
the arguments in the Appendix of Lindsten et al. (2014) and their instrumental representation
of PGAS, the induction can be completed.
Proposition 1 shows the kernels defined by EIS-PGAS and EIS-PG are equivalent. It might
be interesting to see how EIS-PGAS improves the mixing of MCMC from PGAS with a bootstrap
filter. We do not attempt to derive a formal proof, but from equation (11) one can see that the
smaller the variance of ωit−1 the larger the probability p(aM+1t 6= M + 1), i.e. the probability of
the ancestor path of the reference trajectory being different from its original one. EIS is deigned
to minimise the variance of logarithm of the importance weight (see equation (9) and (10)) so
it is expected to be nearly optimal in maximising p(aM+1t 6= M + 1).
Theorem 2 bounds the total variation distance between KMθ (x∗1:T , B) and∫B p(x1:T |y1:T )dx1:T
with the assumption that all importance weights are bounded from above by a constant ωθ <∞.
The proof follows from Doeblin’s theorem, and uniform ergodicity can be established (Doucet
et al., 2001). One may argue that the upper limit condition for importance weights is too
strong in practice (otherwise there would not be any degeneration of particle system). A more
natural condition is to bound the variance of importance weights by a constant. This applies
particularly to our case because EIS importance density minimises the quadratic distance to the
target density. We conjecture that the quadratic Kantorovich distance between (KMθ )n(x∗1:T , ·)
and any other PG kernel without EIS is positive even when n→∞.
C Monte Carlo study of the factor SV model
This section details a simulation study on the high-dimensional factor SV model. As shown in
Section 2, the factor SV model with n assets and p factors boils down to n + p individual SV
models which can be analysed in parallel once the factors and factor loadings are sampled. We
show that the multivariate model is able to achieve efficiency comparable to a univariate model
as expected, especially with the marginalisation of factors and sampling factor loadings Λ based
on a Laplace approximation. In practice, it is important to apply the right degree of shrinkage
on leverage effect and skewness and to determine the number of factors. We demonstrate the
effectiveness and efficiency of picking the right model using the IS2 with PEIS method of Tran
49
et al. (2014) and Scharth and Kohn (2016) applied to our model.
C.1 Model setup
Our baseline model has 50 assets with 8 factors, the same dimensionality as the model of Chib
et al. (2006), but notice in our model there are more than a thousand parameters to estimate.
One feature of our model is the shrinkage on leverage effect and skewness, so we also consider
DGP without leverage effect or skewness, as well as DGP with non-zero leverage effect and
skewness for all factors and asset-specific processes, i.e. containing p+n non-zero leverage effect
parameter ρ’s and skewness parameter β’s. We denote the following DGPs:
• sLE sSK: some have leverage effect, and some have skewness;
• sLE aSK: some have leverage effect, and all have skewness;
• aLE sSK: all have leverage effect, and some have skewness;
• aLE aSK: all have leverage effect and skewness;
• nLE nSK: none has leverage effect or skewness.
“Some”, “all” and “none” in the above definitions refer to the p+n univariate series of the factor
and asset-specific processes, i.e. fj,tpj=1 and ui,tni=1 for t = 1, ..., T . For example, sLE sSK
means that this simulated dataset has non-zero leverage effect and skewness in some of the p+n
series, while all of the series in the dataset aLE sSK has leverage effect but some of them have
skewness.
When a dataset has leverage effect or skewness in some of the p + n univariate series, a
random vector is generated from a binomial distribution with 0.5 probability of success and p+n
trials which serve as an index vector indicating which series have leverage effect or skewness.
Accordingly, we choose a beta prior for the shrinkage parameter introduced in Section 2.3.1
∆ϑ ∼ Beta(2, 2), ∆β ∼ Beta(2, 2).
We assume a flat normal prior for the free elements of Λ or λij ∼ N(0, 10), but we generate those
elements for the simulation study from N(1, 1) so that the prior is effectively non-informative.
Other hyperparameters are generated from their prior distributions given in (29), except that
only negative β’s (if not zero) are selected. Such a design aims to reflect the dynamics and
stylised facts of daily equity returns.
50
The method of Chib and Greenberg (1994) samples the factor loadings Λ with the factors
ft for t = 1, ..., T marginalised out. They compare the their posterior output to the result
obtained via conditioning on the factors as in Pitt and Shephard (999b) and Aguilar and West
(2000). They show that in the case of 4 factors, the sampling of Λ can be 20 and 40 times more
efficient than the method which samples Λ either by column of by row conditional on the factors,
measured by inefficiency factor. And in the case of 8 factors, the efficiency gain can be 80-fold.
We apply their idea of using an MH sampler based on a Laplace approximation for the conditional
posterior distribution of Λ, so we can expect similar efficiency gain with the marginalisation of
factors. Therefore in the following, we do not compare difference of sampling efficiency resulted
from marginalisation of factors, but instead we focus on the effect of EIS proposal and ancestor
sampling used in particle Gibbs algorithm, similar to our simulation study the univariate SV
model. In the next subsections, four estimation methods, i.e. EIS-PGAS, EIS-PG, BF-PGAS
and MM-MH implemented to analyse p+n individual SV models, are considered and compared.
We simulate each dataset with length T = 2000. Figure 8 illustrates the 1-st and 50-th
simulated return series, i.e. y1,t and y50,t as well as the 1-st and 8-th factor, i.e. f1,t and f8,t
with their respective SV process l1,t, l50,t, h1,t and h8,t. Applying the initialisation and MCMC
0 500 1000 1500 2000
0.04
0.02
0.00
0.02
0.04
retu
rn
y1, t
0 500 1000 1500 20000.15
0.10
0.05
0.00
0.05
0.10
0.15y50, t
0 500 1000 1500 20000.04
0.03
0.02
0.01
0.00
0.01
0.02
0.03
0.04f1, t
0 500 1000 1500 2000
0.04
0.02
0.00
0.02
0.04f8, t
0 500 1000 1500 200011.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
log-
vola
tility
l1, t
0 500 1000 1500 200012.5
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5l50, t
0 500 1000 1500 200011.5
11.0
10.5
10.0
9.5
9.0
8.5h1, t
0 500 1000 1500 200012.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0h8, t
Figure 8: Simulated return series and factors with their respective log-volatility from model(16). Upper panel: return series; Bottom panel: log-volatility.
algorithm detailed in Section 2, we run the sampler for 22, 000 iterations for posterior inference
with the first 2000 burn-in samples discarded. In our experiment of applying EIS-PGAS, the
number of MCMC iterations can be safely halved without much difference in terms of posterior
statistics and efficiency. But in order to have reliable posterior comparisons with the other three
methods, we keep the number of iterations at 22, 000, anticipating different degrees of sampling
51
inefficiency for EIS-PG, BF-PGAS and MM-MH.
C.2 Estimation results
Firstly, we discuss some estimation results from applying our proposed method EIS-PGAS to
the most interesting dataset sLE sSK. Figure 9 reports the posterior means and sample standard
deviations of the hyperparameters related to the 58 SV models for fj,t8j=1 and ui,t50i=1 (i.e.
all parameters except for Λ), together with their true DGP values. The top three graphs from
the left to the right are the results for φ, σ, and ρ, while the bottom three graphs from the
left to the right are the results for µ, β, and ζ. All x-axes correspond to the 58 individual
SV models with the first 8 coordinates indicating respective factors and the rest relating to the
asset-specific processes. We represent the true DGP value and posterior mean of each parameter
using a pair of line graphs with value shown on the left y-axis of each graph, and the sample
standard deviations are given by the scatter plot with values indicated by the right y-axis.
10 20 30 40 50
para
met
er e
stim
ate
0.9
0.95
1(i)
stan
dard
dev
iatio
n
0
0.02
0.04
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22(ii)
stan
dard
dev
iatio
n
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-1
-0.5
0(iii)
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8(iv)
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-4
-2
0
2(v)
stan
dard
dev
iatio
n
0.05
0.1
0.15
0.2
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
10
20
30
40
50(vi)
stan
dard
dev
iatio
n
2
4
6
8
10
Pos. meanDGPstd.
Figure 9: EIS-PGAS estimated posterior means and standard deviations of stochastic volatilitymodel parameters for dataset sLE sSK. (i): φ; (ii): σ; (iii): ρ; (iv): µ; (v): β; (vi): ζ. Coordinates 1 to
8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50. Left y-axes:
parameter values; Right y-axes: sample standard deviations.
The results suggest that EIS-PGAS can efficiently estimate all SV hyperparameters accu-
rately. The posterior means of the autoregressive parameter φ, volatility of volatility parameter
σ and unconditional mean of log-volatility µ are close to their true DGP values, especially for the
factors. One or two µ’s and σ’s may be deemed slightly away from their DGP values but with
the standard deviation taken into account, these deviations resulted from a specific simulated
sample path are believed to be reasonably small.
52
From the bottom right graph of Figure 9 one can notice some discrepancies between the
posterior means of ζ’s and their true DGP values and for some of the SV models the sample
standard deviations of ζ’s obtained from the Markov chain are also relatively high. Probably
the d.o.f parameters ζ’s are the poorest estimated ones among all parameters, a result in line
with Nakajima and Omori (2012) which applies MM-MH to model (1). But in section 4.1 we
notice that EIS-PGAS is significantly more efficient than the three alternative methods, as seen
in Table 1 and 2. Of our interest is the effect of the shrinkage prior assumed for the leverage
effect parameter ρ and skewness parameter β. The shrinkage is expected to detect zero leverage
effect and skewness in the dataset sLE sSK automatically, similar to the case of variable selection
discussed by Clyde and George (2004). The vertical line in the top right and bottom middle
graph of Figure 9 indicates zero leverage effect or skewness for a particular individual SV process.
It can be seen that whenever the DGP value is zero, EIS-PGAS effectively gives zero posterior
mean. This confirms that the shrinkage priors help determine zero leverage effect and skewness
and consequently makes the model more parsimonious. However also from the first column of
Figure 11 which shows the posterior probability of a zero parameter estimated by EIS-PGAS,
we can see that all ρ’s in the upper row and β’s in the lower row are “forced” to collapse towards
zero, causing some near-zero leverage effect and skewness parameters to be shrunken to zero.
But we find that the cost of slight over-shrinkage is minor when applying IS2 to calculate the
marginal likelihood of a dataset and associated Bayesian factors.
We report the posterior results of the 1-st, 4-th, 6-th and 8-th factor loadings, i.e. the
respective columns of Λ in Figure 10. True GDP values and posterior means are illustrated by
the bar charts with values corresponding to the left y-axis, while sample standard deviations
are shown by the scatter plots with right y-axis. It is easy to see that EIS-PGAS is able to
estimate the factor loadings very accurately with a flat prior. Though our proposed factor SV
model takes a much more complex form than that of Chib et al. (2006), we have the same
conclusion that the estimation efficiency for factor loadings is mainly due to marginalisation of
factors when sampling the loading matrix Λ based on a Laplace approximation. Furthermore,
it is not affected by the presence of leverage effect, skewness and heavy-tailedness modelled for
factor dynamics.
Table 11 shows the correlation between posterior means of a vector of parameters and their
true DGP with the sum of total absolute deviations in the bracket as a measure of estimation
accuracy. The first row of the first panel in Table 11 shows those statistics for EIS-PGAS
applied to sLE sSK. We can see that except for the ζ’s which have a correlation coefficient 0.85,
53
5 10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-5
0
5
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
5 10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-2
0
2
4
6
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
0.4
Pos. meanDGPstd.
10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-5
0
5
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-2
0
2
4
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
Pos. meanDGPstd.
Figure 10: EIS-PGAS estimated posterior means and standard deviations of factor loadings fordataset sLE sSK. From the top to bottom: loadings for the 1-st, 4-th, 6-th and 8-th. Left y-axis: parameter
values; Right y-axis: standard deviations
all parameters are highly correlated with their DGP counterparts, over 0.94. This suggests that
EIS-PGAS is capable of sampling parameters related to both factor and asset-specific processes
accurately from the joint posterior distribution.
C.3 Comparisons among methods
Among the four estimation methods, BF-PGAS is the easiest to implement, and methods in-
volving EIS is more complicated as one needs to build p+n importance densities in each run of
the MCMC sampler for all SV series and the inverse gamma mixture component. The MM-MH
method of Nakajima and Omori (2012) is built on the classic multi-move sampler of Shephard
and Pitt (1997) and works satisfactorily in the univariate case as demonstrated in the section
4.1. The following shows the estimation efficiency and accuracy for the multivariate extension.
Table 11 summarises the correlation between the posterior means of all parameters estimated
by the four methods and their DGP values under different datasets. Also, the mean absolute
deviations between parameters and their DGP values are reported in the table. Both statistics
serve as a metric for accuracy. EIS-PGAS and EIS-PG are the best to achieve high correlations
for the parameter estimates, and under all datasets EIS-PGAS works better than the other
with only two correlation coefficients below 0.9 and none smaller than 0.8. The mean absolute
deviations given by the two methods are also the smallest among the four, especially for the
d.o.f parameter ζ which is the poorest estimated parameter for all methods and datasets. For
54
example, under the dataset sLE sSK, the mean absolute deviation for ζ given by these two
methods is half of that given by MM-MH, and on fifth of BF-PGAS. It is thus evident that the
EIS part in the algorithm contributes to the estimation accuracy.
Ancestor sampling also improves accuracy slightly as EIS-PGAS gives a bit smaller mean
absolute deviations in most of the cases than EIS-PG except for ρ under aLE saSK. Another
evidence that ancestor sampling may help improve accuracy is that the correlation coefficients
given by EIS-PGAS seem to fluctuate less under different datasets than those given by EIS-PG,
and so do mean absolute deviations. Though as is shown in Table 1 and 2 that for a univariate
SV model, the ancestor sampling algorithm renders estimates more accurate, further study is
needed to pin down its effect on accuracy for the high-dimensional factor SV model. Different
from its performance of estimating a univariate model, MM-MH does not provide correlation
coefficients as high as EIS-PG(AS). The autoregressive coefficient φ, which is relatively easy
to estimate, shows a correlation lower than 0.9 under sLE sSK and sLE aSK. The correlation
for unconditional mean µ is also lower than 0.9 under aLE sSK and aLE aSK, while under all
datasets EIS-PG(AS) is able to estimate µ with correlation always higher than 0.9. In terms
of ζ, MM-MH is less than satisfactory with the highest correlation smaller than 0.85 and the
lowest not exceeding 0.7. Mean absolute deviation also suggests that MM-MH is outperformed
by EIS-PG(AS). For example, under aLE sSK MM-MH gives the mean absolute deviation for µ
larger than 1, while that is only 0.28 and 0.29 given by EIS-PGAS and EIS-PG.
BF-PGAS is the worst performing estimation method with correlation coefficients much lower
and mean absolute deviations much higher than the other three methods. The mean absolute
deviations for φ, ρ and ζ are high compared with their parameter values indicating that for those
parameters this method fails. Though one may argue that the correlation coefficient of 0.91 for φ
given by BF-PGAS under nLE nSK does not suggest much difference from EIS-PG(AS) which
gives correlation 0.97, we notice that the mean absolute deviation given by BF-PGAS is 0.1
while the corresponding value given by EIS-PGAS is only 0.01. Considering the autoregressive
coefficient is often around 0.98 as shown in the top left graph of Figure 9, it is believed that
the posterior mean resulted from BF-PGAS contains a coherent and large bias. This holds
true also for other parameters. The inaccuracy of BF-PGAS is likely due to the dimensionality
involved. It is shown by the asymptotic results of Snyder et al. (2008) and Bickel et al. (2008)
that the inevitable impoverishment of particle quality and the tendency of the particle system to
collapse as the step t goes away from the initialisation t = 0 is because the number of particles
cannot scale exponentially with the dimension of observations n, and bootstrap filter suffers
55
Table 11: Accuracy Comparisons of Different Methods Under Different Datasets
EIS-PGASφ σ ρ µ β ζ Λ
sLE sSK .94 [.01] .94 [.01] .98 [.07] .95 [.21] .98 [.28] .85 [3.89] .99 [.11]sLE aSK .94 [.01] .92 [.01] .96 [.04] .95 [.18] .99 [.35] .91 [5.34] .99 [.13]aLE sSK .95 [.02] .97 [.02] .99 [.08] .97 [.28] .97 [.24] .88 [3.07] .98 [.09]aLE aSK .97 [.01] .96 [.02] .99 [.06] .98 [.18] .99 [.26] .95 [4.41] .99 [.12]nLE nSK .96 [.01] .91 [.02] .95 [.01] .97 [.29] .92 [.08] .82 [4.27] .98 [.06]
MM-MHsLE sSK .87 [.09] .81 [.02] .91 [.15] .92 [.33] .88 [.38] .73 [9.01] .97 [.24]sLE aSK .84 [.11] .96 [.02] .97 [.07] .95 [.84] .92 [.67] .67 [11.91] .93 [.34]aLE sSK .91 [.03] .88 [.01] .89 [.12] .88 [1.11] .93 [.56] .81 [8.49] .98 [.19]aLE aSK .93 [.03] .92 [.02] .87 [.11] .89 [.78] .91 [.44] .74 [5.76] .97 [.40]nLE nSK .93 [.16] .90 [.02] .97 [.03] .93 [.27] .95 [.12] .83 [7.63] .99 [.09]
EIS-PGsLE sSK .93 [.01] .92 [.02] .99 [.08] .94 [.27] .97 [.22] .88 [4.12] .99 [.07]sLE aSK .96 [.02] .91 [.01] .95 [.09] .94 [.31] .98 [.52] .93 [8.69] .99 [.03]aLE sSK .91 [.04] .94 [.02] .94 [.10] .96 [.29] .94 [.32] .84 [6.11] .99 [.10]aLE aSK .97 [.02] .92 [.03] .98 [.06] .97 [.62] .98 [.43] .96 [5.57] .99 [.07]nLE nSK .97 [.01] .96 [.01] .94 [.03] .93 [.46] .95 [.11] .79 [6.79] .99 [.09]
BF-PGASsLE sSK .77 [.19] .64 [.04] .51 [.18] .87 [.57] .78 [.86] .24 [15.04] .86 [.98]sLE nSK .82 [.14] .77 [.06] .62 [.26] .84 [1.34] .69 [.67] .31 [9.60] .87 [1.64]nLE sSK .84 [.09] .68 [.05] .41 [.33] .76 [.88] .74 [.93] .24 [11.24] .79 [1.44]nLE nSK .91 [.10] .81 [.04] .57 [.41] .63 [.76] .84 [.34] .47 [10.07] .85 [.67]aLE aSK .84 [.12] .83 [.02] .45 [.43] .81 [1.21] .62 [.88] .35 [8.65] .89 [1.27]
Reported are the correlation between posterior means from four estimation methods applied to different datasets
and true DGP values with mean absolute deviations given in the bracket.
56
from sharper collapsing rate. In our model, n = 50 and bootstrap filter would require millions of
particles to avoid collapse which limits its practical use for our model. As a result, resampling
has to take place at every t. A direct consequence for this is that BF-PGAS becomes highly
inefficient and inaccurate.
In the whole set of system parameters, the factor loadings Λ are the best estimated ones,
with EIS related methods showing a correlation bigger than 0.98. The smallest correlations of Λ
given by MM-MH and BF-PGAS are 0.93 and 0.79 respectively. The mean absolute deviations
for Λ are also low except for BF-PGAS under all datasets. This shows the effectiveness of our
proposed sampling method for the factor loadings.
Figure 11 shows the posterior probability of zero leverage effect and zero skewness, i.e.
p(ρ|y1:T ) = 0 and p(β|y1:T ) = 0, both of which are a (n + p)-dimensional vector where ρ =
ρfj , ρui and β = βfj , βui for i = 1, ..., n and j = 1, ..., p. The black dots on top of each graph
indicate zero DGP value for the corresponding series, and we represent the posterior probability
of being zero estimated from different methods using different symbols. Notice that the estimate
for ρ and β is obtained with the help of shrinkage prior introduced in section 2.3.1, so new draws
in the MCMC sampler for all elements of ρ and β have non-zero probability to be zero. This
means if one point is closer to a black dot, the better the respective method is able to tell zero
leverage effect or skewness in the DGP. When both zero leverage effect and zero skewness are
present in some of factors and asset-specific processes, EIS-PGAS has the fewest points located
in the “ambiguity” area, namely in the middle of 0 and 1. One can see that whenever ρ = 0
and β = 0, EIS-PGAS gives posterior probability of being zero that is larger than 0.9 for ρ and
0.8 for β. Under sLE sSK, there are three cases of overshrinkage for ρ and just one for β using
EIS-PGAS, while other methods obviously overestimate the posterior zeros probabilities.
EIS-PG and MM-MH perform similarly in determining zero parameters, and they are not
that worse than EIS-PGAS when the DGP values are zero, but these two methods, especially
MM-MH deliver too many points in the ambiguity area. In other words, when the leverage
effect and skewness have non-zero DGP, EIS-PG and MM-MH hesitate more than EIS-PGAS
to assign non-zero values for those parameters. This observation under sLE sSK carries over
to all datasets except nLE nSK, which highlights the use of ancestor sampling if one aims to
not only detecting zero parameters but also accurately estimating non-zero parameters. Under
aLE aSK except for BF-PGAS, all other three methods do not suggest overshrinkage, but EIS-
PG and MM-MH have more points in the ambiguity area than EIS-PGAS, particularly for ρ.
This shows the effect of shrinkage prior on leverage and skewness when importance sampling is
57
coupled with ancestor sampling. Moreover, under nLE nSK or when all members of ρ and β are
equal to zero, EIS-PG(AS) and MM-MH perform equally well with all posterior probability of
zero parameters approaching one. BF-PGAS is the worst estimation method among all, suffering
from its inaccuracy of estimation.
0 10 20 30 40 50
leve
rage
effe
ct
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
EIS-PGASMM-MHEIS-PGBF-PGAS
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sLE_sSK0 10 20 30 40 50
skew
ness
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sLE_aSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aLE_sSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aLE_aSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
nLE_nSK0 10 20 30 40 50
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 11: Posterior probability of zero leverage effect and skewness estimated by differentmethods under different datasets. Upper row: leverage effect parameter ρ; Lower row: skewness parameter
β. Black dots at the top of each graph indicate zero parameter for corresponding series in the DGP. Coordinates
1 to 8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50.
To examine the estimation efficiency, and the efficiency of different MCMC algorithms, we
calculate the inefficiency factor IE(θ) with a vector of system parameters θ. Because under
different datasets, the results are similar, we only report those under aLE aSK12. Table 12
reports the medium of inefficiency factors obtained from the four methods under aLE aSK with
10-th and 90-th percentile in the bracket. A quick observation is that the factor SV model
is although high-dimensional, that during one run of the MCMC sampler it can be put into
n+ p individual univariate model (1) once the factor fj,tpj=1 is sampled greatly improves the
efficiency which is comparable to the univariate model. IE(φ) and IE(µ) are the smallest two
among the four methods, but those given by EIS-PGAS are less than half of those given by the
other three methods. MM-MH even produces the estimate for µ with a medium inefficiency
factor of 42.48, six times larger than 7.48 given by EIS-PGAS. As in the univariate model,
ancestor sampling contributes a lot to the efficiency of the MCMC sampler. EIS-PGAS is at
12This particular dataset is chosen because it has non-zero leverage effect and skewness across all factors andasset-specific process. Other datasets with either ρ or β or both equal to zero show larger inefficiency factor, Butthis is due to many consecutive zeros in the Markov chain.
58
Table 12: Inefficiency Factor for Parameter Estimates Under aLE aSK
EIS-PGAS MM-MH EIS-PG BF-PGAS
φ 8.12 [4.97 14.86] 24.68 [18.94 29.52] 14.69 [10.64 19.48] 18.87 [14.31 22.49]σ 21.33 [8.62 27.53] 124.20 [111.37 134.79] 78.54 [34.73 94.39] 64.61 [48.03 92.59]ρ 22.74 [18.40 28.69] 107.56 [84.16 147.33] 84.57 [41.58 106.34] 87.53 [55.21 104.46]µ 7.48 [5.97 9.06] 42.48 [37.64 48.76] 16.06 [10.55 23.74] 27.45 [14.62 44.78]β 19.73 [14.82 26.45] 214.78 [149.60 307.43] 117.41 [89.65 134.80] 54.73 [39.06 82.36]ζ 43.80 [31.69 67.81] 371.81 [256.14 504.26] 108.56 [86.17 134.89] 53.29 [45.88 67.21]Λ 33.43 [24.74 51.12] 37.85 [27.84 48.57] 41.94 [32.50 57.93] 46.28 [37.66 63.18]
The medium of the vector inefficiency factors is reported with 10-th and 90-th percentiles shown in the bracket.
least twice as efficient as EIS-PG, and for β it is more than five times more efficient.
Furthermore, the proposed procedure of constructing importance density following EIS also
plays an equally important role in improving efficiency. If one compare BF-PGAS with EIS-
PGAS, it is easy to see that except for the loading matrix Λ the latter gives inefficient factor
much smaller than the former, indicating again the necessity of constructing an importance
density that closely approximates the posterior distribution of parameters in a high-dimensional
model. The inefficiency factors for the loading matrix Λ given by the four methods are however
similar, all replying on marginalisation of factors when sampling Λ.
The use of n+p EIS importance densities in the form of (8) does not only gain accuracy and
efficiency for parameter estimates, but also for latent processes, i.e. the SV series ht = hj,tpj=1
and lt = li,ti=n, and the inverse gamma components Wt = Wj,tpj=1 and Qt = Qi,tni=1, and
the factor process fj,tpj=1. The estimates for ht and lt are of special interest as they are related
to covariance matrix forecasting and risk evaluation for a portfolio of equity returns. As an
example, Figure 12 illustrates eight SV series from ht in the upper row and lt in the lower row with
posterior mean estimates given by EIS-PGAS and their GDP values. It can be clearly seen that
EIS-PGAS achieves high accuracy in the smoothed mean estimate of stochastic volatility series,
in terms of both following the overall pattern of the DGP processes and capturing some extreme
values. The converged n+p EIS importance densities together provide a very close approximation
of the intractable posterior distribution of those latent processes. In our experiment, even with
random starting parameter values, EIS-PGAS manages to produce reasonable sample paths.
Figure 13 shows the correlation between the posterior means of all the factors, eight chosen SV
series with their inverse gamma mixture components estimated from the four methods and their
DGP values. Except for BF-PGAS, the correlation for f1,t seems to be higher than all the other
59
0 500 1000 1500 2000
log-
vola
tility
hj,t
-12.5
-12
-11.5
-11
-10.5
-10
-9.5
-9(i)
posterior meanDGP
0 500 1000 1500 2000-13
-12
-11
-10
-9
-8
-7(ii)
0 500 1000 1500 2000-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8(iii)
0 500 1000 1500 2000-12
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8
-7.5(iv)
t0 500 1000 1500 2000
log-
vola
tility
l i,t
-12
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8(v)
t0 500 1000 1500 2000
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8
-7.5(vi)
t0 500 1000 1500 2000
-12
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8
-7.5(vii)
t0 500 1000 1500 2000
-12
-11.5
-11
-10.5
-10
-9.5
-9
-8.5(viii)
Figure 12: Posterior mean estimate for stochastic volatility series hj,t and li,t. logarithm of stochastic
volatility of factor process fj,t (upper row) and idiosyncratic noise process ui,t (lower row). (i): h2,t; (ii): h4,t;
(iii): h5,t; (iv): h8,t; (v): l6,t; (vi): l16,t; (vii): l35,t; (viii): l45,t
factors, with EIS-PG(AS) and MM-MH giving correlation larger than 0.9 and 0.75 respectively
under all datasets. The difference between correlation for f1,t and for fj,t with j 6= 1 is likely
due to the identification restriction imposed on the loading matrix Λ. The correlation for factor
estimates given by EIS-PG is on average slightly lower than EIS-PGAS with exceptions found in
f5,t under sLE sSK and f7,t under nLE nSK. This suggests that ancestor sampling adds certain
degree of precision because of the efficiency gain on top of EIS. In case of ht and lt, EIS-PGAS
is also the best estimation method. For example under sLE sSK, both EIS-PG and MM-MH
given correlation smaller than 0.7 for h2,t. Under sLE aSK and aLE aSK, the gain in precision
by ancestor sampling is seen by correlation from EIS-PGAS being higher than EIS-PG by 5%
to 10%, which suggests that when skewness is present in all factors and asset-specific processes,
ancestor sampling tends to be more effective when used together with the shrinkage prior for ρ
and β.
The “shock variable” Wt and Qt may be of limited use in practice but they serve as stochastic
weights and influence leverage effect as we show in section 2.1, so it is still interesting to see
how the four methods perform when estimating the inverse gamma mixture components. BF-
PGAS is still the most inaccurate one and one can compare with the estimates for the factors
and SV series and observe that EIS-PG is eclipsed by EIS-PGAS. Lastly under nLE nSK, EIS-
PG is almost as efficient as MM-MH, but both giving correlation lower than EIS-PGAS. This
emphasises the effect of leverage and skewness on chosen MCMC algorithms.
60
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
sLE_sSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
sLE_aSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
aLE_sSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
aLE_aSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
nLE_nSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
EIS_PGASMM_MHEIS_PGBF_PGAS
Figure 13: Correlations between posterior mean estimate of all factors, some stochastic volatilityseries with corresponding inverse gamma mixing components obtained from four estimationmethods and their DGP series under different datasets.
C.4 Number of factors
Marginal likelihood evaluation is needed to calculate the Bayes factor to pick up the right model,
such as determining the right number of factors and choosing the most plausible specifications
for factors and asset-specific processes. We firstly illustrate the stability and ability of IS2 to
determine the right number of factors which is the most important model specification. Notice
that there is no need to worry about error distributions thanks to the shrinkage technique we
apply.
Table 13 shows the EIS-PGAS conditional average log-likelihood or posterior ordinate with
system parameters evaluated at their posterior means. We report the evaluation with different
number of particles used in the modified PEIS method introduced in section 3.1. Notice that
with the modification the importance density constructed boils down to partially independent
n + p EIS importance densities used to analyse the univariate SV model, and it manages to
approximate the conditional posterior distribution closely and deliver posterior means that are
highly correlated with the DGP values. We expect that in our high-dimensional setting, not
many particles are needed to accurately evaluate conditional log-likelihood or posterior ordinate,
and as Scharth and Kohn (2016) found that in the case of two-component SV model as few as
two particles can already stably and accurately compute the likelihood. From Table 13 we
see that, the log-likelihood estimates for aLE aSK converge using at least 100 particles, and
61
Table 13: EIS-PGAS Log-likelihood Evaluation
DatasetNumber of particles100 200 300 500 1000
sLE sSK -1864.79 -1834.26 -1833.68 -1833.27 -1833.34sLE aSK -1824.61 -1819.57 -1819.24 -1819.66 -1819.29aLE sSK -1841.27 -1828.03 -1830.86 -1830.94 -1830.62aLE aSK -1812.46 -1811.74 -1811.09 -1811.42 -1812.00nLE nSK -1923.95 -1920.53 -1921.81 -1921.86 -1921.44
Reported are average log-likelihood evaluation with posterior mean parameter estimates and different number of
particles.
for nLE nSK and sLE aSK with the number of particles larger than 200, there is no major
difference in the log-likelihood. For aLE sSK and sLE sSK, more than 300 particles lead to
converged log-likelihood.
It is reasonable to believe that the number of particles needed does not change significantly
across different parameter values when one computes the conditional likelihood of posterior
ordinate, a condition also assumed in Tran et al. (2014). So when applying IS2 to calculate the
marginal likelihood, 300 particles are used with each draw of parameters. Following the formula
(27), we construct the importance density for the parameter q(θ|y1:T ) via mixture of Gaussian
distributions estimated using the MCMC samples. We treat each set of parameters, such as φ
and σ as an individual (n+ p)-dimensional Gaussian mixture random vectors and initialize the
number of components using a standard k-mean algorithm. The mixture can also be constructed
using multivariate Student’s t-distributions, but not much difference is found there.
We repeat the simulation exercise 30 times to obtain 30 sLE sSK datasets with different
realisations, and the DGP is the same as before with 8 factors. Out of the 30 simulated replica-
tions, the IC p1 criterion of Bai and Ng (2002) chooses 8 factors 21 times, and 6, 7, 9, and 10
factors twice, twice, 4 times and once respectively.
Table 14 shows the model comparisons using Bayes factors with the left column indicating
the comparison among models with different number of factors. The Jeffrey’s scale suggests
decisive evidence in favor of the model with 8 factors against all cases. It can be concluded that
if the true DGP is the proposed factor SV model, Bayes factor outperforms the criterion of Bai
and Ng (2002) in determining number of factors. Comparing the model with 7 factors and the
one with 6 factors, the Jeffrey’s scale favors the model with 7 factors in 83.33% of the cases.
Similarly, the model with 9 factors is favored over the one with 10 factors in 83.33% of the cases,
62
Table 14: Frequency(%) of Bayes Factors With Different Number of Factors
sLE sSKDGP: 8 factors1-3.2 3.2-10 10-100 >100 Total>10
8/6 0 0 0 100.00 100.008/7 0 0 0 100.00 100.008/9 0 0 3.33 96.67 100.008/10 0 0 0 100 100.00
7/6 0 16.67 40.00 43.33 83.337/9 0 6.67 56.67 36.67 93.337/10 0 3.33 6.67 0.90 96.67
9/6 0 0 0 100.00 100.009/10 0 16.67 36.67 46.67 83.33
The choice of range for Bayes factors is according to the Jeffrey’s scale. Frequency distribution is determined across
30 simulated replications. The left column indicates the comparison between two specifications. For example 8/6
corresponds to the Bayes factor calculation with marginal likelihood of a model with 8 factors in nominator and that
with 6 factors in the denominator.
so we conjecture that IS2 may also be effective in selecting a misspecified model that is “closer”
to the true model. In our experiment, besides its performance of choosing number of factors, we
also notice that IS2 is as stable as the method of reduced MCMC run of Chib and Greenberg
(1994) and Nakajima (2015) but much easier to implement.
63