Post on 15-Apr-2017
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
On the Stability of Clustering Financial TimeSeries – How to investigate?
IEEE ICMLA Miami, Florida, USA, December 9-11, 2015
Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen
9 December 2015
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Financial time series (data from www.datagrapple.com)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Clustering?
Definition
Clustering is the task of grouping a set of objects in such a waythat objects in the same group (cluster) are more similar to eachother than those in different groups.
French banks (blue) andbuilding materials (red)CDS over 2006-2015
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Why clustering?
Mathematical finance: Use of variance-covariance matrices(e.g., Markowitz, Value-at-Risk)
Stylized fact: Empiricalvariance-covariance matricesestimated on financial timeseries are very noisy(Random Matrix Theory,Noise Dressing of FinancialCorrelation Matrices, Lalouxet al, 1999)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
λ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
ρ(λ
)
Marchenko-Pastur distribution vs.empirical eigenvalues distribution
of the correlation matrix
How to filter these variance-covariance matrices?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
For filtering, clustering!
Mantegna (1999) et al’s work:
0 100 200 300 400 5000
100
200
300
400
500
0 100 200 300 400 5000
100
200
300
400
500
0 100 200 300 400 5000
100
200
300
400
500
(left) empirical correlation matrix(center) the same matrix seriated using a hierarchical clustering(right) correlations filtered using the clustering structure
N.B. other applications: statarb, alternative risk measures
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Why stability?
statistical consistency ofthe clustering method
requires assumptions thatmay not hold in practice:e.g. returns are i.i.d.,underlying elliptical copula,enough data is available
stability is a weakerproperty: reproducibility ofresults across a wide rangeof slight data perturbations
Clusters obtained at time t, t + 1,t + 2; Is the difference between thesuccessive clusters a “true” signal?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Is the clustering of financial time series stable?
According to [2], clusters are not stable
with respect to the clustering algorithm,
but only a squared Euclidean distance was considered which is notrelevant for clustering assets from their returns (cf. [4]).
Idea: A more relevant distance should increase stability
We investigate the clustering stability resulting from using:
an Euclidean distance
a Pearson correlation distance [3]
a Spearman correlation distance
a distance for comparing two dependent random variables [4]
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Some usual distances for clustering financial time series
(P it)t≥0
S it+1 = log P i
t+1− log P it
(S it )t≥1
Euclidean distance:d(S i ,S j) =
∑Tt=1(S i
t − S jt )2
Pearson correl.: ρ(S i ,S j) =∑Tt=1(S i
t−S i )(S jt−S j )√∑T
t=1(S it−S i )2
√∑Tt=1(S j
t−S j )2
Spearman correl.: ρS(S i ,S j) =1− 6
T (T 2−1)
∑Tt=1(S i
(t) − S i(t))2
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Generic Non-Parametric Distance [4]
d2θ (Xi ,Xj) = θ3E
[|Pi (Xi )− Pj(Xj)|2
]+ (1− θ)
1
2
∫R
(√dPi
dλ−√
dPj
dλ
)2
dλ
(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,(iii) dθ is invariant under diffeomorphism
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Generic Non-Parametric Distance [4]
d20 : 1
2
∫R
(√dPidλ −
√dPj
dλ
)2
dλ = Hellinger2
d21 : 3E
[|Pi (Xi )− Pj(Xj)|2
]=
1− ρS2
= 2−6
∫ 1
0
∫ 1
0C (u, v)dudv
Remark: If
f (x , θ) = c(F1(x1; ν1), . . . ,FN(xN ; νN); θc)N∏i=1
fi (xi ; νi )
then with CML hypothesis
ds2 = ds2copula +
N∑i=1
ds2margins
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Sliding Window
PCA stability curve (red) vs.Euclidean Clusters stability curve asa function of time using results from[1] for fair comparison: clusters aremore stable
most basic perturbation:traders face it everydaywhen monitoring theirindicators
we do not want to overfitour analysis to thisparticular stability goal
stability perf.: dist. [4] 'Spearman � Pearson �Euclidean
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Odd vs. Even
A clustering al-gorithm appliedon two samplesdescribing the samephenomenon shouldyield the sameresults.
How to obtain twoof these samples? (un)Stability of
clusters with L2
distance
Stability of clusterswith the proposeddistance [4]
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Economic Regimes
AXA 5-year CDS spread over 2006-2015
Average of the pairwisecorrelations; correlationskyrockets during crises
Is the clustering structure persistent?Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Economic Regimes Clustering Stability
Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Heart vs. Tails Clustering Stability
≈ orange+red vs. green+yellow periods
Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Multiscale
Is the clustering structure persistent to different sampling frequencies?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Multiscale Clustering Stability
Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Maturities & Term Structure
An asset is described by several time series whose dynamics are similar:Nokia Oyj is described here by the cost of insurance against its defaultfor {1, 3, 5, 7, 10} years
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Maturities & Term Structure Clustering Stability
Pearson (top left), Spearman (top right),Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Discussion and questions?
A given clustering algorithm yields a particular clusteringstructure, but with a relevant distance it can be more stable
The perturbations presented can be readily extended (e.g.using different CDS datasets)
Disclosing stability results is interesting since complexmodels often perform poorly (the many parameters aresomewhat overfitted) and cannot be used by practitioners
Correlation+distribution distance (presented in [4]) may workfor your applications (which ones?)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
C. Ding and X. He.K-means clustering via principal component analysis.In Proceedings of the twenty-first international conference onMachine learning, page 29. ACM, 2004.
V. Lemieux, P. S. Rahmdel, R. Walker, B. Wong, andM. Flood.Clustering techniques and their effect on portfolio formationand risk analysis.In Proceedings of the International Workshop on Data Sciencefor Macro-Modeling, pages 1–6. ACM, 2014.
R. N. Mantegna and H. E. Stanley.Introduction to econophysics: correlations and complexity infinance.Cambridge university press, 1999.
G. Marti, P. Very, and P. Donnat.
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
Introduction to financial time series clusteringEmpirical results from the clustering stability study
Conclusion
Toward a generic representation of random variables formachine learning.Pattern Recognition Letters, 2015.
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series