On the stability of clustering financial time series

25
Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion On the Stability of Clustering Financial Time Series – How to investigate? IEEE ICMLA Miami, Florida, USA, December 9-11, 2015 Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen 9 December 2015 Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Transcript of On the stability of clustering financial time series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

On the Stability of Clustering Financial TimeSeries – How to investigate?

IEEE ICMLA Miami, Florida, USA, December 9-11, 2015

Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen

9 December 2015

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

1 Introduction to financial time series clustering

2 Empirical results from the clustering stability study

3 Conclusion

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Financial time series (data from www.datagrapple.com)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Clustering?

Definition

Clustering is the task of grouping a set of objects in such a waythat objects in the same group (cluster) are more similar to eachother than those in different groups.

French banks (blue) andbuilding materials (red)CDS over 2006-2015

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Why clustering?

Mathematical finance: Use of variance-covariance matrices(e.g., Markowitz, Value-at-Risk)

Stylized fact: Empiricalvariance-covariance matricesestimated on financial timeseries are very noisy(Random Matrix Theory,Noise Dressing of FinancialCorrelation Matrices, Lalouxet al, 1999)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

λ

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

ρ(λ

)

Marchenko-Pastur distribution vs.empirical eigenvalues distribution

of the correlation matrix

How to filter these variance-covariance matrices?

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

For filtering, clustering!

Mantegna (1999) et al’s work:

0 100 200 300 400 5000

100

200

300

400

500

0 100 200 300 400 5000

100

200

300

400

500

0 100 200 300 400 5000

100

200

300

400

500

(left) empirical correlation matrix(center) the same matrix seriated using a hierarchical clustering(right) correlations filtered using the clustering structure

N.B. other applications: statarb, alternative risk measures

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Why stability?

statistical consistency ofthe clustering method

requires assumptions thatmay not hold in practice:e.g. returns are i.i.d.,underlying elliptical copula,enough data is available

stability is a weakerproperty: reproducibility ofresults across a wide rangeof slight data perturbations

Clusters obtained at time t, t + 1,t + 2; Is the difference between thesuccessive clusters a “true” signal?

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Is the clustering of financial time series stable?

According to [2], clusters are not stable

with respect to the clustering algorithm,

but only a squared Euclidean distance was considered which is notrelevant for clustering assets from their returns (cf. [4]).

Idea: A more relevant distance should increase stability

We investigate the clustering stability resulting from using:

an Euclidean distance

a Pearson correlation distance [3]

a Spearman correlation distance

a distance for comparing two dependent random variables [4]

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Some usual distances for clustering financial time series

(P it)t≥0

S it+1 = log P i

t+1− log P it

(S it )t≥1

Euclidean distance:d(S i ,S j) =

∑Tt=1(S i

t − S jt )2

Pearson correl.: ρ(S i ,S j) =∑Tt=1(S i

t−S i )(S jt−S j )√∑T

t=1(S it−S i )2

√∑Tt=1(S j

t−S j )2

Spearman correl.: ρS(S i ,S j) =1− 6

T (T 2−1)

∑Tt=1(S i

(t) − S i(t))2

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Generic Non-Parametric Distance [4]

d2θ (Xi ,Xj) = θ3E

[|Pi (Xi )− Pj(Xj)|2

]+ (1− θ)

1

2

∫R

(√dPi

dλ−√

dPj

)2

(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,(iii) dθ is invariant under diffeomorphism

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Generic Non-Parametric Distance [4]

d20 : 1

2

∫R

(√dPidλ −

√dPj

)2

dλ = Hellinger2

d21 : 3E

[|Pi (Xi )− Pj(Xj)|2

]=

1− ρS2

= 2−6

∫ 1

0

∫ 1

0C (u, v)dudv

Remark: If

f (x , θ) = c(F1(x1; ν1), . . . ,FN(xN ; νN); θc)N∏i=1

fi (xi ; νi )

then with CML hypothesis

ds2 = ds2copula +

N∑i=1

ds2margins

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

1 Introduction to financial time series clustering

2 Empirical results from the clustering stability study

3 Conclusion

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Sliding Window

PCA stability curve (red) vs.Euclidean Clusters stability curve asa function of time using results from[1] for fair comparison: clusters aremore stable

most basic perturbation:traders face it everydaywhen monitoring theirindicators

we do not want to overfitour analysis to thisparticular stability goal

stability perf.: dist. [4] 'Spearman � Pearson �Euclidean

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Odd vs. Even

A clustering al-gorithm appliedon two samplesdescribing the samephenomenon shouldyield the sameresults.

How to obtain twoof these samples? (un)Stability of

clusters with L2

distance

Stability of clusterswith the proposeddistance [4]

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Economic Regimes

AXA 5-year CDS spread over 2006-2015

Average of the pairwisecorrelations; correlationskyrockets during crises

Is the clustering structure persistent?Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Economic Regimes Clustering Stability

Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Heart vs. Tails Clustering Stability

≈ orange+red vs. green+yellow periods

Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Multiscale

Is the clustering structure persistent to different sampling frequencies?

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Multiscale Clustering Stability

Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Maturities & Term Structure

An asset is described by several time series whose dynamics are similar:Nokia Oyj is described here by the cost of insurance against its defaultfor {1, 3, 5, 7, 10} years

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Maturities & Term Structure Clustering Stability

Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

1 Introduction to financial time series clustering

2 Empirical results from the clustering stability study

3 Conclusion

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Discussion and questions?

A given clustering algorithm yields a particular clusteringstructure, but with a relevant distance it can be more stable

The perturbations presented can be readily extended (e.g.using different CDS datasets)

Disclosing stability results is interesting since complexmodels often perform poorly (the many parameters aresomewhat overfitted) and cannot be used by practitioners

Correlation+distribution distance (presented in [4]) may workfor your applications (which ones?)

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

C. Ding and X. He.K-means clustering via principal component analysis.In Proceedings of the twenty-first international conference onMachine learning, page 29. ACM, 2004.

V. Lemieux, P. S. Rahmdel, R. Walker, B. Wong, andM. Flood.Clustering techniques and their effect on portfolio formationand risk analysis.In Proceedings of the International Workshop on Data Sciencefor Macro-Modeling, pages 1–6. ACM, 2014.

R. N. Mantegna and H. E. Stanley.Introduction to econophysics: correlations and complexity infinance.Cambridge university press, 1999.

G. Marti, P. Very, and P. Donnat.

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

Introduction to financial time series clusteringEmpirical results from the clustering stability study

Conclusion

Toward a generic representation of random variables formachine learning.Pattern Recognition Letters, 2015.

Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series