A Bayesian Analysis of Multiple Change Point Problems

8/3/2019 A Bayesian Analysis of Multiple Change Point Problems

http://slidepdf.com/reader/full/a-bayesian-analysis-of-multiple-change-point-problems 1/26

A Bayesian analysis of multiple change pointproblems in data sequence

Rosangela H. Loschi1

Departamento de Estatıstica, Universidade Federal de Minas Gerais,

Belo Horizonte - MG, Brazil

email: [email protected]

Pilar L.Iglesias and Reinaldo B. Arellano-Valle

Departamento de Estadıstica, Facultad de Matematicas

Pontificia Universidade Catolica de Chile, Santiago -Chile.

email:{pliz, reivalle}@mat.puc.cl

Frederico R. B. Cruz

Departamento de Estatıstica, Universidade Federal de Minas Gerais,

Belo Horizonte - MG, Brazil

email: [email protected]

Abstract

We apply the product partition model (PPM) to identify multiple change points

in normal means (µ) and variances (σ2), extending some previous works. We

establish a full predictivistic characterization for the prior distribution of µ andσ2 which yields an easier way to obtain the prior distribution of these parameters

by considering opinion on observable quantities only. We also propose a Gibbs

sampling scheme to estimate the posterior distributions of the number of change

points and of the instants when changes occured. We apply the results to identify

multiple changes in the expected return and the volatility of a series of returns

in the Chilean stock market, providing a sensitivity analysis of the model if some

different prior specifications are considered. We conclude that Chilean market

possesses expected return and volatility clusteres and that the product estimates

are influenced by the prior specifications.

Keywords: Gibbs sampling, predictivism, product partition model, Student-t

distribution.

1Corresponding author.Departamento de Estatıstica, ICEx, Universidade Federal de Minas Gerais, Caixa

Posta 702, CEP: 31270-901- Belo Horizonte, MG, Brasil. Fax: +55 33 3499 5924

1



1 Introduction

In this paper we consider a Bayesian analysis of the multiple change point problem using

the product partition model (PPM) proposed by Hartigan (1990). The PPM allows the

identification of multiple change points in the parameters as well as in functional form of

the distribution function itself. Besides this some flexibility is introduced by the PPM in the

analysis of change point problems since the number of change points is a random variable (as

opposed to a known number considered in threshold models (Chen and Lee (1995), Geweke

and Terui (1993)) and in the model considered by Hawkins (2001), for example).

The one change point problem has been approached from the Bayesian point of view by sev-

eral authors. For example, Menzefricke (1981) considers the problem of making inferences

about a change point in the precision of normal data with unknown mean. A single change

point in the functional form of the distribution is explored by Hsu (1984), who considers the

class of the exponential-power distributions (Box and Tiao, 1973) for treating the problem.

Both authors apply their methodologies to stock market prices. The Bayesian identification

of a single change point is also discussed by Smith (1975). The PPM proposed later by

Hartigan (1990) generalizes most situations described before. The PPM is applied by Barry

and Hartigan (1993) to identify multiple change points in the mean of normal random vari-

ables with common variance. Recently, Crowley (1997) provides a new implementation of the

Gibbs sampling in order to solve the problem of estimating normal means by using PPM. The

identification of change points in normal means with common variance is also considered by

Chernoff and Zacks (1964) and Gardner (1969) using different Bayesian approaches. (More

about change point problems can be found in Carlstein, Mueller and Siegmund (1994).)

The aim of this paper is to apply the PPM presented by Barry and Hartigan (1992) to

identify multiple change points in both the mean µ and the variance σ2 of normal data

2



which are sequentially observed, extending some results from Barry and Hartigan (1993)

and Crowley (1997). We consider a conjugate prior distribution for the parameters µ and

σ2, justifying this choice within a full predictivistic setting due to de Finetti (1937). In fact,

we propose a more tractable way to elicit the prior distibution of µ and σ2 by considering

only opinions on observable quantities. We also use Yao’s (1984) algorithm to compute the

posterior estimates or product estimates for these parameters. A Gibbs sampling scheme to

estimate the posterior distributions of the number of change points as well as the instants

when changes occured is proposed. In spite of using the transformation suggested by Barry

and Hartigan (1993), the proposed method to estimate these posterior distribution was not

found in the literature. We also consider different prior specifications for the probability that

a change occurs in any instant and evaluate the sensitivity of the PPM to these different

choices. In order to illustrate the method, the results are applied to identify multiple change

points in the mean and variance of a series of returns of the Chilean stock market. As a

consequence, it is reported that returns in the Chilean stock market are characterized by

changes in the expected or mean return and volatility (measured here as variance).

The PPM introduced by Barry and Hartigan (1992) is briefly reviewed in Section 2. Later in

Section 2 we obtain the Student-t PPM for random variables which are normally distributed,

given the mean and variance (both unknown), providing the posterior estimation for these

parameters. A predictivistic characterization of the Student-t PPM, which explains the

choice of the prior distributions adopted in an alternative way, is provided as a by-product.

In Section 3, we introduce procedures based on Gibbs sampling schemes to compute the pos-

terior distributions for the random partition and for the number of change points, assuming

normal data. Finally, in Section 4 we apply the procedures obtained in Sections 2 and 3

to identify change points in the mean return as well as in the volatility of Endesa (Chilean

National Electric Company) returns. We also provide a sensitivity analysis to the PPM.

3



2 The Student-t PPM

In this section we apply the Product Partition Model (PPM) introduced by Barry and Har-

tigan (1992) to identify change points in the mean and variance of normal data observed

through time. We consider a conjugate analysis and present a full predictivistic characteri-

zation to the complete model (likelihood function and prior distribution). First, we present

the definition of PPM and some preliminary results obtained from this model, as given by

Barry and Hartigan (1992, 1993).

2.1 The product partition model (PPM)

Let X 1, . . . , X n be a data sequence. Consider a random partition ρ of the set I = {1, . . . , n}

and a random variable B that represents the number of blocks in ρ. Consider that each parti-

tion ρ = {i0, i1, . . . , ib}, 0 = i0 < i1 < · · · < ib = n, divides the sequence X 1, . . . , X n into B =

b, b ∈ I , contiguous subsequences, which will be denoted by X[ir−1ij ] = (X ir−1+1, . . . , X ir),

r = 1, . . . , b. Let c[ij] be the prior cohesion associated to the block [ij] = {i + 1, . . . , j},

i, j ∈ I ∪ {0}, j > i, which represents the degree of similarity among the observations in X[ij]

(Hartigan, 1990).

Hence, it is said that the random quantity (X 1, . . . , X n; ρ) follows a PPM, denoted by

(X 1, . . . , X n; ρ) ∼ P P M , if:

i) the prior distribution of ρ is the following product distribution:

P (ρ = {i0, . . . , ib}) =Πb j=1c[ij−1ij ]C Πb

j=1c[ij−1ij ]

, (2.1)

where C is the set of all possible partitions of the set I into b contiguous blocks with

end points i1, · · · , ib, satisfying the condition 0 = i0 < i1 < .. . < ib = n, b ∈ I ;

4



ii) conditionally on ρ = {i0, . . . , ib}, the sequence X 1, . . . , X n has the joint density given

by:

f (X 1, . . . , X n|ρ = {i0, . . . , ib}) = Πb j=1f [ij−1ij ](X[ij−1ij ]), (2.2)

where f [ij](X[ij]) is the joint density of the random vector X[ij] = (X i+1, . . . , X j).

Notice that the number of blocks B in ρ has a prior distribution given by:

P (B = b) ∝ C1 Πb j=1c[ij−1ij ], b ∈ I, (2.3)

where C1 is the set of all partitions of I in b contiguous blocks.

As shown in Barry and Hartigan (1992), the posterior distributions of ρ and B have the

same form of prior distribution, where the posterior cohesion for the block [ij] is given by

c∗[ij] = c[ij]f [ij](X[ij]). That is, the PPM induces some kind of conjugacy.

In the parametric approach to the PPM, a sequence of unknown parameters θ1, . . . , θn, such

that, conditionally in θ1, . . . , θn, the sequence of random variables X 1, . . . , X n has conditional

marginal densities f 1(X 1|θ1), . . . , f n(X n|θn), respectively, is considered. In this case, it is

considered that two observations X i and X j , such that i = j, are in the same block, if it

is believed that they are identically distributed. Thus, in this approach to the PPM, the

predictive distribution f [ij](X [ij]), which appeared in (2.2), can be obtained as follows:

f [ij](X[ij]) = Θ[ij]

f [ij](X[ij]|θ) π[ij](θ)dθ, (2.4)

where Θ[ij] is the parameter space corresponding to the common parameter, say, θ[ij] =

θi+1 = . . . = θ j , which indexes the conditional density of X[ij].

The prior distribution of θ1, . . . , θn is constructed as follows. Given a partition ρ = {i0, . . . , ib},

b ∈ I, we have that θi = θ[ir−1ir] for every ir−1 < i ≤ ir, r = 1, . . . , b, and that θ[i0i1], . . . , θ[ib−1ib]

5



are independent, with θ[ij] having (block) prior density π[ij](θ), θ ∈ Θ[ij].

Hence, the goal in the parametric PPM is to obtain the marginal posterior distributions of

the parameters ρ, B, and θk, k = 1, . . . , n. Barry and Hartigan (1992) have shown that the

posterior distributions of θk is given by:

π(θk|X 1, . . . , X n) =k−1i=0

n j=k

r∗[ij] π[ij](θk|X[ij]), (2.5)

for k = 1, . . . , n, and the posterior expectation of θk is given by:

E (θk|X 1, . . . , X n) =

k−1i=0

n j=k

r∗[ij] E (θk|X[ij]), (2.6)

for k = 1, . . . , n, where r∗[ij] denotes the posterior relevance for the block [ij], that is:

r∗[ij] = P ([ij] ∈ ρ|X 1, . . . , X n) =λ[0i]c

∗[ij]λ[ jn]

λ[0n]

, (2.7)

where λ[ij] =

Πbk=1c∗[ik−1ik]

, and the summation is over all partitions of {i + 1, . . . , j} in b

blocks with endpoints i0, i1, . . . , ib, satisfying the condition i = i0 < i1 < · · · < ib = j.

2.2 Product estimates for normal means and variances

Assume that θ1 = (µ1, σ21), . . . , θn = (µn, σ2

n), such that X k|µk, σ2k ∼ N (µk, σ2

k), k = 1, . . . , n ,

and they are independent. Denote by θ[ij] = (µ[ij], σ2[ij]) the common parameter related to the

block [ij]. Thus, the Student-t PPM can be specified by considering the following conditional

( j − i)-dimensional normal distribution for the observations in X[ij]:

X[ij]|µ[ij], σ2[ij] ∼ N j−i(µ[ij]1 j−i, σ2

[ij]I j−i), (2.8)

where 1k and Ik are the k-dimensional vector of one and the k × k-dimensional identity

matrix, respectively; as well as by assuming that (µ[ij], σ2[ij]) has normal-inverted-gamma

6



prior distribution denoted by (µ[ij], σ2[ij]) ∼ NIG(m[ij], v[ij]; a[ij]/2, d[ij]/2), that is,

µ[ij]|σ2[ij] ∼ N (m[ij], v[ij]σ

2[ij]) and σ2

[ij] ∼ IG(a[ij]/2, d[ij]/2), (2.9)

where IG(a, b) is the inverted-gamma distribution with parameters a and b. Under (2.8) and

(2.9), the conditional distribution of θ[ij] = (µ[ij], σ2[ij]), given the observations in X[ij], is the

normal-inverted-gamma distribution given by

µ[ij]|X[ij], σ2[ij] ∼ N (m∗

[ij], v∗[ij]σ2[ij]) and

σ2[ij]|X[ij] ∼ IG(a∗[ij]/2, d∗[ij]/2),

(2.10)

where

m∗[ij] =

( j

−i)v[ij]X[ij]

( j−i)v[ij]+1 +

m[ij]

( j−i)v[ij]+1

v∗[ij] =v[ij]

( j−i)v[ij]+1

d∗[ij] = d[ij] + j − i

a∗[ij] = a[ij] + q[ij](X[ij]),

(2.11)

with

X [ij] =1

j − i

jr=i+1

X r,

q[ij](X[ij]) =

jr=i+1

(X r − X [ij])2 + ( j − i)(X [ij] − m[ij])

2

( j − i)v[ij] + 1.

(See O’Hagan (1994) for details). Therefore, we obtain from (2.10) and (2.6) that the product

estimates for µk and σ2k are given by

E (µk|X 1, . . . , X n) =k−1i=0

n j=k

r∗[ij]m∗[ij] (if d∗[ij] > 1) (2.12)

and

E (σ2k|X 1, . . . , X n) =

k

−1

i=0

n j=k

r∗[ij] a∗[ij]d∗[ij] − 2

(if d∗[ij] > 2), (2.13)

respectively, k = 1, . . . , n , where m∗[ij], a∗[ij] and d∗[ij] are defined as in (2.11).

Notice that the PPM induced by (2.8) and (2.9) implies that for each block [ij], the ran-

dom vector X[ij] follows a ( j − i)-dimensional Student-t distribution denoted by X[ij] ∼

t j−i(m[ij], V[ij]; a[ij], d[ij]) with density function given by

7



f (X[ij]) = c(d[ij], j − i)ad[ij]/2

[ij] |V[ij]|−1/2

{a[ij] + (X[ij] − m[ij])V−1

[ij](X[ij] − m[ij])}−(d[ij]+ j−i)/2, (2.14)

where c(d, k) = Γ[d+k2

]{Γ[ d2

] πk2 }−1 and m[ij] = m[ij]1 j−i and V[ij] = I j−i + v[ij]1 j−i1

j−i.

The distribution in (2.14) is named by Arellano-Valle and Bolfarine (1995) Generalized

Student-t distribution, which is reduced to the usual Student-t distribution with d[ij] degrees

of freedom and the same dispersion matrix when a[ij] = d[ij]. Notice that assuming this

model, the elements within the same block are correlated and distributed according to a

distribution with heavier tail than the normal distribution. Moreover, for the block [ij] it

follows that

E (X j|X j−1, . . . , X i) = E (µ[ij]|X j−1, . . . , X i) = m∗[i( j−1)]

and

E (X 2 j |X j−1, . . . , X i) = E [(σ2[ij] + µ2

[ij])|X j−1, . . . , X i)

=( j − i)v[ij] + 1

( j − i − 1)v[ij] + 1

a∗[i( j−1)]

d∗[i( j−1)] − 2 + (m∗[i( j−1)])2

,

where m∗[i( j−1)], d∗[i( j−1)] and a∗[i( j−1)] are defined as in (2.11).

2.3 Yao’s algorithm

In order to compute the posterior relevances given in (2.7) we consider the following recursive

algorithm proposed by Yao (1984).

λ[00] = 1,

λ[01] = c∗[01],

λ[0 j] = c∗[0 j] + j−1

t=1 λ[0t]c∗[tj], ∀ j = 2, . . . , n ,

λ[(n−1)n] = c∗[(n−1)n],

λ[in] = c∗[in] +n−1

t=i+1 λ[tn]c∗[it], ∀i = 1, . . . , n − 2,

λ[nn] = 1.

(2.15)

8



where λ[ij] is the summation presented in (2.7) and c∗[ij] is the posterior cohesion of the block

[ij]. A Gibbs sampling scheme to compute the posterior relevances can be found in Loschi

et al. (2003). See Barry and Hartigan (1993) for a Gibbs sampling scheme to compute the

product estimates directly.

2.4 A Predictivistic justification of the Student-t PPM

Sometimes to elicit prior distributions to solve real problems is not an easy task. In this

section we establish a full predictivistic characterization to the Student-t PPM presented in

Section 2.2 where the likelihood function as well as the prior distribution of µ and σ2 are

consequences of judgements on observable quantities. As a by-product this characterization

provides a tractable way to elicit the prior distribution of (µ, σ2).

As shown in Section 2.2, the Student-t distribution is a location and scale mixture of the

normal distribution, where the mixing measure is the normal-inverted-gamma distribution.

Thus, it follows that the Student-t distribution can be obtained in two stages. Firstly, a

conditional normal distribution, given the location and scale parameters, is specified. Sec-

ondly, we identify a normal-inverted-gamma distribution as the prior joint distribution for

the location and scale parameters. By adopting the predictivistic approach de Finetti (1937),

the first stage is replaced by an assumption about observables (Iglesias (1993) and Wech-

sler (1993)). For example, the assumption of invariance under some groups of orthogonal

transformation over infinite sequences of random quantities implies that the law of sequence

of observables can be represented as mixtures of conditionally normally distributed and in-

dependent quantities (see Kingman (1972), Smith (1981), Diaconis, Eaton and Lauritzen

(1992)). However, this type of condition does not permit the characterization of the mixing

measure. Additional conditions have to be assumed to obtain the mixing measure. Arellano-

9



Valle, Bolfarine and Iglesias (1994), following Diaconis and Ylvisaker( 1979, 1985), charac-

terize a scale mixture of a normal distribution by considering invariance under orthogonal

transformation and additional conditions which determine how to predict X 2n+1. In the full

predictivistic approach considered by Arellano-Valle, Bolfarine and Iglesias (1994) the mix-

ing measure (prior distribution) obtained is the inverted-gamma distribution. These authors

also obtain a characterization for a location and scale mixture of normal distributions which

depends on non-observable quantities - that is, it is not a full predictivistic characterization

of the model. Proposition 2.1 in the following improves this partial result.

Consider X n = 1

nn

i=1

X i and S 2n = n

i=1

(X i − X n)2. We say that an infinite sequence

of random variables X 1, X 2, . . . is O(1)-invariant, if for each n ≥ 2 and real values m and

r, the conditional distribution of X[0n], given X n = m and S 2n = r2, is uniform on the n-

sphere centred in m1n and with ratio r, that is, on the set S n = {(x1, . . . , xn) ∈ Rn : xn =

m,n

i=1(xi − xn)2 = r2}.

Proposition 2.1 Let X 1, X 2, . . . be an infinite sequence of O(1)–invariant random variables,

such that P (X 1 = X 2) = 0 and

E (X 23 |X 1, X 2) = e(X 21 + X 22 ) + w,E (X 3|X 1, X 2) = e(X 1 + X 2) + u,

(2.16)

then e ∈ (0, 1/2), u ∈ R, w > u2/(1 − 2e) and, for each n ≥ 3,

X[0n] ∼ tn

u

1 − 2e1n, In +

e

1 − 2e1n1

n;1

e

w −

u2

1 − 2e

;

1 + e

e

. (2.17)

The converse also holds.

Proof: From Smith’s (1981) theorem, there are random variables µ and σ2, such that, for

every n ≥ 2,

X[0n]|µ, σ2 ∼ N (µ1n, σ2In),

where σ2 > 0 with probability one. Consequently, considering M =2

i=1 X i = 2X and

Q =2

i=1 X 2i = S 2 + 2X 2 and denoting by θ = (θ1, θ2) = (µ/σ2, −1/2σ2) the natural

10



parameter of the distribution of (M, Q), given (µ, σ2), we obtain the following conditional

density of (M, Q) given θ:

dP θ(M, Q) = exp{(θ1, θ2)(M, Q)t − D(θ)}dξ(M, Q),

where dξ(M, Q) = 1π√2

(Q − M 2/2)−12 dλ, λ is the Lebesgue measure defined on R2 and

D(θ) = −θ21/(2θ2) − log(−θ2).

The vector of partial derivates of D(θ) with respect to the natural parameters θ1 and θ2 is

given by

D(θ) = −θ1θ2

,θ21

2θ22−

1

θ2 = E {(M, Q)|θ}.

Hence, by using properties of the conditional expectation and conditions (2.16), it follows

thatE {D(θ)|(M, Q)} = E {E {(M, Q)|θ1, θ2}|(M, Q)}

= 2E {E {(X 3, X 23)|(µ, σ2)}|X 1, X 2}= 2e(X 2 + X 1; X 22 + X 21 ) + 2(u, w).

From Theorem 3 in Diaconis and Ylvisaker (1979) the following prior density for ( µ, σ2) is

obtained:

π(µ, σ2) = K 1σ2

12e+

32 exp− 1

2eσ2 w − u2

1−2e 1−2e

eσ2 12

exp

−1−2e2eσ2

µ − u

1−2e2

. (2.18)

Consequently, (2.17) is obtained (see O’Hagan (1994) pp.244). The converse is obtained by

using the properties of the Student-t distribution (see Arellano-Valle and Bolfarine (1995)).

Proposition 2.1 improves some partial results from Arellano-Valle, Bolfarine and Iglesias

(1994) by providing a full predictivistic characterization to a location and scale mixture of

normal distributions. Extensions of this result to Student-t linear models can be found in

Loschi, Iglesias and Arellano-Valle (2003).

Corollary 2.1 Consider the assumptions established in Proposition 2.1. Then, the parame-

ters µ and σ2 have the following inverse-gamma-normal distribution:

µ|σ2 ∼ N

u1−2e , eσ2

1−2e

and

σ2 ∼ IG

12e

w − u2

1−2e

, 1+e

2e

.

(2.19)

11



Thus, under O(1)-invariance assumptions the representations in (2.16) are equivalent to the

specification in (2.19).

3 Posterior Distributions for ρ and B

In this section, we provide the exact posterior distribution for ρ and B assuming the prior

cohesions suggested by Yao (1984) and propose a Gibbs sampling scheme to estimate these

posterior distributions.

3.1 Exact Posterior Distributions

Let p, 0 ≤ p ≤ 1, be the probability that a change occurs at any instant in the sequence.

Therefore the prior cohesion for block [ij] corresponds to the probability that a new change

takes place after j − i instants, given that a change has taken place at instant i, that is,

c[ij] =

p(1 − p) j−i−1 if j < n(1 − p) j−i−1 if j = n.

(3.1)

Notice that the prior cohesions given in (3.1) imply that the sequence of change points

establishes a discrete renewal process, with occurence times identically distributed with

geometric distribution. If a high value for p is considered we are previously assuming that

there are small blocks of data (or, equivalently, a large number of change points) in the

data sequence. Assuming these cohesions, it follows from expression (2.1) that the prior

distribution of ρ takes the form

P (ρ = {i0, i1, . . . , ib}) = pb−1(1 − p)n−b,

b ∈ I, which depends only on the number of observations n and the number of blocks b in the

partition, but does not depend on the positions where the change points occur. Moreover,

12



it follows that the prior distribution for the random variable B is given by

P (B = b) = C n−1b−1 pb−1(1 − p)n−b, ∀ b ∈ I,

where C n

−1

b−1 is the number of distinct partitions of I into b contiguous blocks. Consequently,

we have that:

E (B) = (n − 1) p and (3.2)

V (B) = (n − 1) p(1 − p). (3.3)

From Section 2.1 we only need to find the posterior cohesion for each block to obtain the

posterior distribution of ρ and B. Recalling that the posterior cohesion for the block [ij] is

obtained by multiplying the correspondent prior cohesion by the predictive distribution of

X[ij], which is the Student-t distribution defined in (2.14), the following result is obtained:

c∗[ij] =

p(1− p)j−i−1c(d[ij],j−i)ad[ij]/2

[ij]

(1+( j−i)v[ij])1/2{a[ij]+q[ij](X[ij])}(d[ij]+j−i)/2 , if j < n

(1− p)j−i−1c(d[ij],j−i)ad[ij]/2

[ij]

(1+( j−i)v[ij])1/2{a[ij]+q[ij](X[ij])}(d[ij]+j−i)/2 , if j = n,

where c(d, k) and q[ij](X[ij]) are defined as in (2.14) and (2.11), respectively.

Notice that the exact calculation of the posterior distribution for ρ and B demands great

computational efforts, in spite of the simplifications introduced by Yao’s (1984) algorithm. In

the next section we propose a Gibbs sampling scheme for computing the posterior distribution

for the random partition ρ and for the random quantity B, which is based on the sample

generated by using the Gibbs sampling approach (see Gelfand and Smith (1990), Gamerman

(1997) for MCMC methods).

3.2 Gibbs Sampling Approach

Consider the auxiliary random quantity U i suggested by Barry and Hartigan (1993) which

reflects whether a change point has, or has not occured at the time i, that is,

U r =

1 if θr = θr−10 if θr = θr−1,

13



r = 2, . . . , n (U 1 = 0). Thus, the random quantity ρ is perfectly identified by considering a

vector of these random quantities, namely, (U 2, . . . , U n), n > 2. Consequently, we can esti-

mate the posterior probability for each particular partition ρ = {i0, i1, . . . , ib} by computing

the proportion of samples of (U 2, . . . , U n) such that U ir = 0 for r = ik + 1, k = 1, . . . , b − 1,

and U r = 1 otherwise.

Similarly, it is possible to use the above procedure to estimate the posterior distribution of

B noticing that

B = 1 +n−1i=1

(1 − U i).

The vector or partition (U k

2 , . . . , U k

n) at step k is generated by using the Gibbs sampling as

follows. Starting with an initial sampling (U 02 , . . . , U 0n) of the random vector (U 2, . . . , U n), at

step k, the r-th element U kr is generated from the conditional distribution

U r|U k2 , . . . , U kr−1, U k−1r+1 , . . . , U k−1n ; X 1, . . . , X n,

r = 2, . . . , n . To generate the vectors above, it is sufficient to consider the ratios given by

the following expressions:

Rr =P (U r = 1|Ak; X 1, . . . , X n)

P (U r = 0|Akr ; X 1, . . . , X n)

,

r = 2, . . . , n , where Akr = {U k2 = u2, . . . , U kr−1 = ur−1, U k−1r+1 = ur+1, . . . , U k−1n = un}. Hence,

considering a degenerate prior distribution for p, we have that

Rr =c∗[xy]

c∗[xr]c∗[ry]

,

where c∗[ij] is the posterior cohesion for block X[ij],

x =

maxi{0 < i < r, U ki = 0}, if U ki = 0 for some i ∈ {2, . . . , r − 1}

0, otherwise,

and

y =

mini{r < i < n, U k−1i = 0}, if U k−1i = 0 for some i ∈ {r + 1, . . . , n}

n, otherwise.

14



Consequently, the criterion for choosing the vectors (U k2 , . . . , U kn) becomes

U kr =

1, if

c∗[xy]

c∗[xr]

c∗[ry]

≥ 1−uu

0, otherwise,

r = 2, . . . , n , where u is a random number chosen from the uniform distribution U (0, 1). This

completes the procedure to estimate the posterior distributions for the random partition ρ

and for the number of blocks B. (Loschi et al. (2003) extend the PPM presented in this

paper by considering a beta prior distribution for p. In these cases, the choice of p seems

less arbitrary since the beta family is rich enough to describe the uncertainty about p under

many practical circumstances. For example, a proper non-informative prior distribution for

p can be specified declaring the beta parameters equal to 1. A comparison between the

results obtained here and those obtained by Loschi et al. (2003) can be found in Loschi and

Cruz (2002a) which conclude that the product estimates obtained by using a degenerate

prior distribution to p or a beta prior distribution with modal value close to this fix value of

p are similar.)

4 Applications: The Chilean Stock Market Behavior

The ultimate goal of this section is to present a sensitivity analysis for the PPM assuming

different degenerate prior distributions for p and to identify multiple change points in the

mean (or expected return) and variance (volatility) of the returns of the Endesa stock series

(Figure 1) within the period from 1987 to 1994 using the methodology developed in the

previous section. As usual in finance, a return series is defined by using the transformation

X t = (P t − P t−1)/P t−1, where P t is the price in the month t. Defined in this way, the returns

within each block can be considered normally distributed, given the expected return and the

volatility (Correa, 1998).

15



Figure 1: Returns of ENDESA

Year

Returns

87 88 89 90 91 92 93 94 95

-0.2

0.0

0.2

0.4

4.1 Sensitivity analysis

We adopt the following normal-inverted-gamma prior specification to describe uncertainty

on the parameter (µ[ij], σ2[ij]):

µ[ij]|σ2[ij] ∼ N (0, σ2

[ij]), and σ2[ij] ∼ IG

0.01

2,

4

2

.

We also consider the prior cohesions given in (3.1). Since a small number of changes is

expected we consider p = 0.01 and 0.1 to evaluate the influence of these prior specifications on

the posterior estimates of µ, σ2, B and ρ. We also consider very different prior specifications

for the chance of a change occurring assuming p = 0.5 and p = 0.9. In these cases, a higher

number of change points is expected in the prior evaluation.

In the Gibbs sampling scheme, we generate 5,000 samples of (U 2, . . . , U n) with dimension 94,

starting from a vector of zeros. After convergence has been reached, we discarded the initial

1,000 interactions. A lag of 1 is selected since the correlation among vectors is low.

The algorithm used here were coded in C ++. All tests were performed in a PC-like computer,

166 MHz, 32 MB RAM, running Windows 98, and using the freely available C ++ compiler

DJGPP (http://www.delorie.com/djgpp).

Figures 2 and 3 show the posterior estimates of µk and σ2k, k = 1, . . . , 95, that is, for the

16



monthly mean returns and volatility, respectively. The product estimates of µ (σ2) are

contrasted with the centered arithmetic moving average (variance) of order 10 for the means

(variances), respectively. It is noticeable that more instants are identified as a change point

if higher values of p are considered. We also notice that similar estimates are obtained for

close values of p. If p = 0.1 we observe that the estimates obtained using PPM are very

similar to the naıve estimates.

Figure 2: Posterior means of µ

*

*

**

*

*

*

**

*****

**

*

****

*

*

*

*

*

*

*

*

****

*

**

**

*

***

***

*

*

*

*

*

*

*

*

*

*

*

*

***

**

*

**

**

*

*

**

*

**

*

***

***

**

*

*

*

*

*

*****

*

*

Year

MeanRetur

n

87 88 89 90 91 92 93 94 95

-0.2

0.0

0.2

0.4

p=0.01

*DataMean R.M. Aver.

*

*

**

*

*

*

**

*****

**

*

****

*

*

*

*

*

*

*

*

****

*

**

**

*

***

***

*

*

*

*

*

*

*

*

*

*

*

*

***

**

*

**

**

*

*

**

*

**

*

***

***

**

*

*

*

*

*

*****

*

*

Year

MeanRetur

n

87 88 89 90 91 92 93 94 95

-0.2

0.0

0.2

0.4

p=0.1


*

*

**

*

*

*

**

*****

**

*

****

*

*

*

*

*

*

*

*

****

*

**

**

*

***

***

*

*

*

**

*

*

*

*

*

*

*

***

**

*

*

*

**

*

*

**

*

**

*

*

**

***

***

*

*

**

*****

*

*

Year

MeanReturn

87 88 89 90 91 92 93 94 95

-0.2

0.0

0.2

0.4

p=0.5


*

*

**

*

*

*

**

*****

**

*

****

*

*

*

*

*

*

*

*

****

*

**

**

*

***

***

*

*

*

**

*

*

*

*

*

*

*

***

**

*

*

*

**

*

*

**

*

**

*

*

**

***

***

*

*

**

*****

*

*

Year

MeanReturn

87 88 89 90 91 92 93 94 95

-0.2

0.0

0.2

0.4

p=0.9


Figure 3: Posterior means of σ2

Year

Volatility

87 88 89 90 91 92 93 94 95

0.005

0.015

0.025

p=0.01

VolatilityMoving Variance

Year

Volatility

87 88 89 90 91 92 93 94 95

0.01

0.02

0.03

0.04

p=0.1


Year

Volatility

87 88 89 90 91 92 93 94 95

0.01

0.02

0.03

0.04

p=0.5


Year

Volatility

87 88 89 90 91 92 93 94 95

0.005

0.015

0.025

p=0.9


17



Figure 4 presents the most probable partition for different values of p. Notice that similarly

to the conclusions we drew from Figures 2 and 3, we can observe that for higher values of p

more instants are identified as change points.

Figure 4: Posterior distribution of ρ

****************************

*

***************

*

**********

*

**************************************

Year

Returns

87 88 89 90 91 92 93 94 95

-0.2

0.2

0.6

1.0

*PatitionEndesa Returns

p=0.01

********

*

***********

*

*

*

*

*

***

*

************

*

**

*

**********

*

***

*

****

*

***********

*

**************

*

**

Year

Returns

87 88 89 90 91 92 93 94 95

-0.2

0.2

0.6

1.0


p=0.1

*

**

**

*

*

*

*

****

*

*

*

*

**

*

*

***

*

*

********

*

****

*

**

*

*

***

*

***

*

****

*

**

*

**

**

*

*

*

*

*

***

*

*

*

*

*

*

**

**

*

*

*

*

**

*

*

*

*

**

Year

Returns

87 88 89 90 91 92 93 94 95

-0.2

0.2

0.6

1.0


p=0.5

*

*

*****************************************

*

**

**

***************

*

**

*

*************

*

************

*

Year

Returns

87 88 89 90 91 92 93 94 95

-0.2

0.2

0.6

1.0


p=0.9

Table 1 presents the prior and posterior probabilities of the most probable partition. Notice

that the probability of occurrence of the posterior most probable partition increase substan-

tially in the posterior evaluation.

Table 1: Prior and posterior probability of the most probable partition

p Prior probability Posterior probability0.010 4.007 × 10−7 0.3567

0.100 1.593 × 10−16

0.01730.500 2.524 × 10−29 0.00130.900 1.161 × 10−13 0.0285

From Figure 5 we can notice that the posterior distribution of the number of blocks in the

partition B (or for the number of change points in the time series B − 1) has only one mode

independently of the value assumed for p. We can also notice that if p is small the posterior

18



distribution of B are centered in lower values (see Table 2 for the descriptive statistics of

the posterior distribution of B). It is also noticeable that for all values of p the probability

of having one or more change point in the Endesa series is one.

Figure 5: Posterior distribution of B

No. Blocks

probability

0 20 40 60 80

0.0

0.1

0.2

0.3

0.4

0.5

p=0.01

No. Blocks

probability

0 20 40 60 80

0.0

0.04

0.08

0.12

p=0.1

No. Blocks

probability

0 20 40 60 80

0.0

0.02

0.04

0.06

0.08

p=0.5

No. Blocks

probability

0 20 40 60 80

0.0

0.04

0.08

0.12

Table 2: Descriptive statistics - prior and posterior distributions of B

Prior Distribution Posterior Distribution

p Mean Variance Mean Variance Mode Median Q1 Q30.010 0.940 0.9306 5.093 2.158 4 4 4 60.100 9.400 8.4600 17.075 9.682 16 17 15 190.500 47.000 23.5000 50.521 23.436 50 50 47 540.900 84.600 8.4600 84.753 9.263 85 85 83 87

Notice from Table 2 that the summaries of location (mean, mode, median) of the posterior

distribution of B as well as the mean of the prior distribution of B increase if p increases. We

also observe that the posterior variance is higher than the prior variance for p = 0.01, 0.1 and

0.9. The opposite conclusion can be drawn for 0.5. It is also noticeable that the posterior

variance increases if p increases for values of p up to 0.5. (See more about the influence of

19



prior specifications in the PPM in Loschi and Cruz (2002a,b)).

4.2 A note on the model specification

We suppose that, conditionally in the average stock return and its total standard deviation,

any path followed by the returns within a block presenting the same average returns and

total standard deviation is “equally likely” to occur, which is mathematically expressed

by the O(1)-invariance assumption amongst the returns. Hence, assuming extendibility -

that is, assuming that all subsequences (X i+1, . . . , X j) are part of an infinite O(1)-invariant

sequence, we have that the joint distribution of the Endesa returns in the same block, X[ij],

can be represented as a mixture of the product of the normal distributions N (µ[ij], σ2[ij])

(Smith, 1981), what agrees with Correa (1998) assumptions about the Chilean market.

We also assume the conditions in (2.16) understanding that these conditions elucidate the

considerations made by Mandelbrot (1963), as well as what Maeda (1996) suggests to be

reasonable for the Chilean market, that large returns tend to be followed by large returns

and small returns tend to be followed by small returns and changes in this behavior are

produced by unanticipated information.

These assumptions leads to a predictive distribution with heavy tails (Student-t distribution)

for the returns in the same block which also discloses a structure of correlation amongst the

returns. Since the Chilean stock market is an emerging market, and so it can experience more

changes than a developed market, because it is more susceptible to the political atmosphere,

the Student-t distribution is more appropriate to describe the behavior of its stock returns

(Duarte Jr. and Mendes (1997) and Mendes (2000)). (Notice that the normality assumption

adopted by Hsu (1984) (see also Hawkins (2001)) to describe the behavior of the Dow Jones

Industrial Average is stronger than the assumptions we did - we only state that data is

conditionally normally distributed.)

20



The prior cohesions given in (3.1) and considered to analyze the Endesa series imply that

the sequence of change points establishes a discrete renewal process, with occurence times

identically distributed with geometric distribution. This type of product partition distribu-

tion is adequate to represent reasonably well the situation described by Mandelbrot (1963)

(and later by Maeda (1996) for the Chilean stock market), who established that changes in

the behavior of the series of stock returns are a consequence of the receipt of information not

previously anticipated, so that the past change points are noninformative about the future

change points.

5 Conclusions

In this paper we have applied the PPM to identify multiple change points in normal means

and variances for data sequences, extending previous results from Barry and Hartigan (1993)

and Crowley (1997). We have proposed a Gibbs sampling scheme to estimate the posterior

distributions of the number of change points as well as for the instants when the changes

occured. We have applied the method to indentify change points in the mean return and

the volatility of the Endesa stock returns and provided a sensitivity analysis for the PPM.

The results indicate that the procedures proposed to compute the posterior estimates of B

and ρ are quite effective, simple and easy to implement. We also conclude that the prior

specifications for p strongly influences on both the posterior distributions of the number of

change points and of the instants when changes occured as well as in the product estimates of

the mean and the variance. Since p is crucial for the inferences we can estimate p assuming a

prior distribution for it. In this case, the conjugacy of PPM model is lost and, it is impossible

to use Yao’s procedure (see Loschi and Cruz (2002a)). A procedure to obtain the posterior

distribution of p using Gibbs sampling can be found in Loschi and Cruz (2003)

We believe that some improvement would be obtained if a modification could be done in Yao’s

21



procedure, such that a prior distribution to p could be included. An alternative algorithm is

considered in Quintana and Iglesias (2003) in connection with the non-parametric approach

to cluster analysis.

Some open questions remain. Can different prior specifications for the mean and variance

affect the product estimates? Would it be possible to find even simpler implementations for

the PPM? How well does the methodology fit in the presence of outliers? These and other

similar questions are interesting topics for future research in this area.

6 Acknowledgements

This research supported in part by PRPq-UFMG, grant 4801-UFMG/RTR/ FUNDO/PRPq/

RECEM DOUTORES/00; and CAPES; FONDECYT, grants 8000004, 1971128 and 1990431;

and Fundacion Andes (Chile). The authors hereby would like to thank Heleno Bolfarine and

Wilfredo Palma for their valuable comments and contributions to this paper.

References

Arellano-Valle, R. B. and H. Bolfarine. On some characterizations of the t-distribution.

Statistics & Probability Letters, 25:79–85, 1995.

Arellano-Valle, R. B., H. Bolfarine, and P. L. Iglesias. A predictivistic interpretation of the

multivariate t distribution. Test , 2(3):221–236, 1994.

Barry, D. and J. A. Hartigan. Product partition models for change point problems. The

Annals of Statistics, 20(1):260–279, 1992.

Barry, D. and J. A. Hartigan. A Bayesian analysis for change point problem. Journal of the

American Statistical Association , 88(421):309–319, 1993.

22



Box, G. E. P. and G.C. Tiao. Bayesian Inference in Statistical Analysis. Addison-Wesley,

New York, USA, 1973.

Chen, C. W. S. and J. C. Lee. Bayesian inference of threshold autorregressive models. Jounal

of Time Series Analysis, 16(5):483–492, 1995.

Chernoff, H. and S. Zacks. Estimating the current mean of a normal distribution which is

subjected to changes in time. Annals of Mathematical Statistics, 35:999–1018, 1964.

Correa, L.. Modelacion Bayesiana de puntos de cambio en la volatilidad. Master’s thesis,

Facultad de Matematicas - Pontifıcia Universidad Catolica de Chile, Chile, 1998. (in

Spanish).

Crowley, E. M. Product partition models for normal means. Journal of the American

Statistical Association , 92(437):192–198, 1997.

de Finetti, B.. La prevision: ses lois logiques, ses sources subjectives. Annales de l’Institute

Henri Poincare, 7:1–68, 1937.

Diaconis, P., M. L. Eaton, and S. L. Lauritzen. Finite de finetti theorems in linear models

and multivariate analysis. Scandinavian Journal of Statistics, 19:298–315, 1992.

Diaconis, P. and D. Ylvisaker. Conjugate priors for esponential families. Annals of Statistics,

7:269–281, 1979.

Diaconis, P. and D. Ylvisaker. Quantifying prior opinion. In J. M. Bernardo, M. H. DeGroot,

D. V Lindley, and A. F. M Smith, editors, Bayesian Statistical 2 , pages 133–156. North-

Holland, Elsevier Science, 1985.

Duarte Jr., A. M. and B. V. M. Mendes. Product partition models for normal means.

Emerging Markets Quarterly , 1(4):85–95, 1997.

23



Gamerman, D. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference.

Chapman and Hall, London, UK, 1997.

Gardner, L. A. On detecting change in the mean in the normal variates.Annals of Mathe-

matical Statistics, 40:116–126, 1969.

Gelfand, A. E. and A. F. M. Smith. Sampling-based approaches to calculating marginal

densities. Journal of the American Statistical Association , 85:398–409, 1990.

Geweke, J. and N. Terui. Bayesian threshold autoregressive models for nonlinear time series.

Journal of Time Series Analysis, 14(5):441–454, 1993.

Hartigan, J. A. Partition models. Communication in Statistics - Theory and Method , 19(8):

2745–2756, 1990.

Hawkins, D. M. Fitting multiple change-point models to data. Computational Statistics &

Data Analysis, 1:323–341, 2001.

Hsu, D. A. A bayesian robust detection of shift in the risk struture of stock market returns.

Journal of the American Statistical Association , 77(2):407–416, 1984.

Iglesias, P. L. Formas finitas do teorema de de Finetti: A vis˜ ao preditivista da Inferencia

Estatıstica em populac˜ oes finitas. PhD thesis, Departamento de Estatıstica, Instituto

de Matematica e Estatıstica, Universidade de Sao Paulo, Sao Paulo, Brazil, 1993. (in

Portuguese).

Kingman, J. F. C. On random sequences with spherical symetry. Biometrika , 59:183–197,

1972.

Loschi, R. H. and F. R. B. Cruz. An analysis of the influence of some prior specifications in

24



the identification of change points via product partition model. Computational Statistics

& Data Analysis, 39:477-501, 2002.

Loschi, R. H., F. R. B. Cruz. Appling the product partition model to the identification of multiple change points. Advances in Complex Sistems 5 (4):371-387,2002.

Loschi, R. H., F. R. B. Cruz. Extension to the Product Partition Model: Computing the

Probability of a Change. (Manuscript submmited to publication), 2003.

Loschi, R. H., F. R. B. Cruz, P. L. Iglesias, and R. B. Arellano-Valle. A gibbs sampling scheme

to the product partition model: An application to change-point problems. Computers &

Operations Research , 30(3):463–482..

Loschi, R. H., P. L. Iglesias, and R. B. Arellano-Valle. Predictivistic characterization of

multivariate Student-t models. Journal of Multivariate Analysis,85 (1):10-23, 2003.

Maeda, M. A. Volatilidad estocastica en el mercado accionario chileno. Master’s thesis,

Facultad de Ciencias Economicas y Administrativas - Universidad de Chile, Chile, 1996.

(in Spanish).

Mandelbrot, B.. The variation of certain speculative prices. Journal of Business, 36:394–419,

1963.

Mendes, B. V. M. Computing robust risk measures in emerging equity markets using extreme

value theory. Emerging Markets Quarterly , pages 24–41, 2000.

Menzefricke, U. A Bayesian analysis of a change in the precision of a sequence of independent

normal random variables at an unknown time point. Applied Statistics, 30(2):141–146,

1981.

25



Carlstein, E. , Mueller, H. G. and D. Siegmund (eds.) Change-point problems. IMS Lecture

Notes - Monograph Series, 23, USA, 1994.

O’Hagan, A.Kendall’s Advanced Theory of Statistics 2A

, chapter Bayesian Inference. JohnWiley & Sons, New York, NY, 1994.

Quintana, F. A. and P. L. Iglesias. Nonparametric Bayesian clustering and product partition

models. Journal of the Royal Statistical Society B (to appear), 2003.

Smith, A. F. M. A Bayesian approach to inference about a change-point in a sequence of

random variables. Biometrika , 62(2):407–416, 1975.

Smith, A. F. M. On random sequences with centered spherical symmetry. Journal of the

Royal Statistical Society, B , 43:203–241, 1981.

Wechsler, S. Exchangeability and predictivism. Erkenntnis: International Journal of Ana-

lytic philosophy , 38:343–350, 1993.

Yao, Y. Estimation of a noisy discrete-time step function: Bayes and empirical Bayes ap-

proaches. The Annal of Statistics, 12(4):1434–1447, 1984.

26

A Bayesian Analysis of Multiple Change Point Problems

Documents

Transcript of A Bayesian Analysis of Multiple Change Point Problems