Functional Data Analysis and Cluster Analysis: a Marriage...

43
Functional Data Analysis and Cluster Analysis: a Marriage with some Constraints Simone VANTINI joint with L.M. SANGALLI, P. SECCHI, V. VITELLI 28th November 2014, Bozen-Bolzano MOX Dept. of Mathematics, Politecnico di Milano

Transcript of Functional Data Analysis and Cluster Analysis: a Marriage...

Page 1: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

Functional Data Analysis and Cluster Analysis:

a Marriage with some Constraints

Simone VANTINI

joint with L.M. SANGALLI, P. SECCHI, V. VITELLI

28th November 2014, Bozen-Bolzano

MOX – Dept. of Mathematics, Politecnico di Milano

Page 2: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

A marriage… 2

Cluster Analysis

(Identification of groups)

Functional Data Analysis

(Analysis of functions)

Functional Data Clustering

(Identification of groups of functions)

+

Page 3: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

… with some constraints 3

There are indeed some serious issues going on in this marriage:

• No model at hand

(probability density function does not exist,

basically impossible to assess the validity of the model)

• Choice of the Smoothing

(functions and their derivatives need to be estimated from

point-wise noisy evaluations)

• Choice of the Metric

(huge variety of distances wrt to the multivariate framework)

• Choice of the Group of Warping Functions

(data should generally be horizontaly aligned)

All

Co

nn

ecte

d!

K-mean Clustering of Misaligned Data Using a Derivative-based Metric

Page 4: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Choice of the Smoothing 4

Same row data (smoothed using B-splines with different number of knots)

Weak Smoothing

One Cluster

Strong Smoothing

Two Clusters

Figures are courtesy of Secchi, P., Vantini, S., Vitelli, V. (2013): " Bagging Voronoi classifiers for clustering spatial

functional data", International Journal of Applied Earth Observation and Geoinformation, Vol. 22, pp. 53-64

Page 5: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Choice of the Metric 5

• From an L2 perspective the two datasets shows the same

variability across curves.

• From an H1 perspective the two datasets shows a different

variability across curves (lower the former, larger the latter).

• : 1D i.i.d. random variable

• : Fourier basis

Figures are courtesy of A. Menafoglio; P. Secchi; M. Dalla Rosa (2013), “A Universal Kriging predictor for spatially

dependent functional data of a Hilbert Space”. Electronic Journal of Statistics 7, 2209–2240

Page 6: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

6 A Snapshot at Functional Data Registration/Alignment

Registration of a set of functions

Find suitable warping functions h1,…, hn such that c1◦h1,…, cn◦hn are the most similar.

Landmark Approach (similar means that functions are warped along the

x-axis such that each (known) landmark occurs at the same point along

the x-axis)

Continuous Approach (similar means that functions are warped along

the x-axis such that for each point along the x-axis functions present

close values along the y-axis)

Page 7: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Choice of the Group of Warping Functions 7

0 1 2 3 4 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Time

Spik

es_

Inte

nsity

Data Clustered

0 1 2 3 4 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Not Aligned Curves

Four Clusters

Aligned Curves

Two Clusters

200 periodic curves (spike-trains of neuronal activity)

Figures are courtesy of Patriarca, M., Sangalli, L.M., Secchi, P., Vantini, S.: "Analysis of Spike Train Data: an Application of K-mean

Alignment“, Electronic Journal of Statistics, 8, 1769-177, Special Section on Statistics of Time Warpings and Phase Variations

Page 8: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

The Algorithm

A Toy Example

The Theory

The Case Study

8

Clustering of Misaligned

Functional Data

Outline

Page 9: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

9

The Algorithm

Page 10: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Framework 10

K-mean

Clustering

K-mean

Alignment

Continuous

Alignment

Functional Clustering Alignment / Registration

Page 11: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

11 Goals

Goal of Continuous Alignment:

Decoupling Phase and Amplitude Variability

Goal of K-mean Clustering:

Decoupling Within and Between-cluster (Amplitude) Variability

Goal of K-mean Alignment:

Decoupling Phase Variability, Within-cluster Amplitude Variability,

and Between-cluster Amplitude Variability

Page 12: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

12 Continuous Alignment

(e.g. Sangalli, Secchi, Vantini, Veneziani 2009)

Estimate the template

curve c0

For each curve,

find hi that maximizes similarity

between template curve c0

and patient warped curve ci ◦ hi

n unregistered curves

n registered curves

and n warping functions

Maximization

Estimation

Continuous

Alignment

Page 13: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

13 K-mean Clustering

(e.g. Tarpey and Kinateder 2003)

Estimate the K centroids

curves {c0k}k=1,2,..,K

Assign ci to the k-th cluster

if the similarity

between c0k and ci

is maximal over k = 1, 2, …, K

n unregistered curves

K clusters

Assignment

Estimation

K-mean

Clustering

Page 14: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

14 K-mean Clustering vs Continuous Alignment

Estimate the K centroids

curves {c0k}k=1,2,..,K

Assign ci to the k-th cluster

if the similarity

between c0k and ci

is maximal over k = 1, 2, …, K

n unregistered curves

K clusters

Assignment

Estimation

Estimate the template

curve c0

For each curve,

find hi that maximizes similarity

between template curve c0

and patient warped curve ci ◦ hi

n unregistered curves

n registered curves

and n warping functions

Maximization

Estimation

K-mean

Clustering

Continuous

Alignment

Page 15: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

15 K-mean Alignment

Estimate the K template/centroid curves {c0k}k=1,2,..,K

For each curve, find hik that maximizes similarity between

each template curve c0k and the candidate warped curve ci ◦ hi

k

n unregistered curves

n registered curves, n warping functions and K clusters

Maximization

Estimation

Assign ci to the k-th cluster if the similarity between c0k and ci ◦ hi

k

is maximal over k = 1, 2, …, K and then warp ci along hi = hi k

Assignment

K = 1 Continuous Registration Algorithm (e.g. Sangalli et al. 2009)

W = {1} Functional K-mean Clustering (e.g. Tarpey and Kinateder 2003)

Page 16: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Functional Clustering

K-mean Alignment:

A Double-Facet Method

16

Alignment / Registration

K-mean

Clustering

K-mean

Alignment

Continuous

Alignment

It is a K-mean Clustering

Algorithm

where warping is allowed

It is an Alignment Algorithm

with K templates

Page 17: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

17

A Toy Example

Page 18: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

18 A Simulated Toy Example:

Simulation Details

2 Amplitude Clusters (2 template curves)

with further clustering in the phase

Variability in both

Amplitude and Phase

Page 19: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

19

Aligned and clustered curves Warping functions

K = 2

K = 1

K = 3

A Simulated Toy Example:

Algorithm Results

X

V

V

Page 20: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

20 A Simulated Toy Example:

Algorithm Results

Boxplots of the similarity indices

between curves and templates

Means of the similarity indices

between curves and templates

Page 21: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

21

The Theory

Page 22: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Abstract Visualization of Alignment 22

Original

Functions

Aligned

Functions

Equivalence

Classes

generated by

the Action of the

Group of

Warping

Functions

Page 23: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Meta-Equivalence Result 23

Analysis of a

registered functional data set

with respect to the metric d

Analysis of a

set of equivalence classes

(induced by the application of W

to the original functions)

with respect to a new metric dF

(jointly defined by d and W). =

Practice Theory

(e.g., K-mean Alignment

in the Functional Space)

(e.g., K-mean Clustering

in the Quotient Space)

Page 24: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Meta-Equivalence Theorem 24

Page 25: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

A Topological Characterization of Phase and

Amplitude Variability 25

The introduction of

a metric/semi-metric d and of a group W of warping functions,

with respect to which the metric/semi-metric is invariant,

enables a not ambiguous definition of phase and amplitude variability.

Total Variability (variability between elements of F)

Amplitude Variability (variability between equivalence classes)

Phase Variability (variability within equivalence classes)

Page 26: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Ancillary variability

26

Hierarchy of Quotient Spaces

Functions belonging to

are grouped in

equivalence classes belonging to

that are grouped in

equivalence classes belonging to

Page 27: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

Some examples of W-invariant semi-metrics

27

H-translations Ø

H-translations V-translations

H-translations V-translations

H-translations V-translations

V-linear trends

H-translations

H-dilations V-dilations

H-translations

H-dilations

V-translations

V-dilations

… … …

H-diffeomorphisms V-translations

Metric / Semi-metric

Maximal W

(Phase Variability) Ancillary Variability

Page 28: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

28

The Case Study

Page 29: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

29 Cerebral Aneurysms

Cerebral aneurysms: malformations of cerebral arteries, in

particular of arteries belonging to or connected to the

Circle of Willis.

EPIDEMIOLOGICAL STATISTICS • Incidence rate of cerebral aneurysms:

1/20 people

• Incidence rate of ruptured cerebral aneurysms per year:

1/10000 people per year

• Mortality due to a ruptured aneurysm:

> 50%: Out of 9 patients with a ruptured aneurysm:

3 are expected to die before arriving at the hospital

2 to die after having arrived at the hospital

2 to survive with permanent cerebral damages

2 to survive without permanent cerebral damages

Page 30: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

30 Pathological Classification

Upper High Aneurysm

Lower Low Aneurysm

Healthy No Aneurysm

33

25

7

Page 31: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

31 From X-rays to Centerlines

Surface Points Voronoi Diagram Eikoinal Equation Centerline

3d-array

(one slice) Gradient 3d-array

(one slice)

X-rays

(one direction) Contrast Fluid

Injections

1 2 3 4

5 6 7 4

Page 32: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

32 Discrete Data

xij Left-Right

yij Top-Down

zij Front-Back

Rij MISR

sij Abscissa

Observational Study conducted at Ospedale Ca’ Granda Niguarda – Milano

relative to 65 patients hospitalized from September 2002 to October 2005.

Page 33: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

33 Functional Data

The sample of 65 ICA: each patient is represented

by the centerline of their ICA

Page 34: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

34 ICA centerline First Derivatives

Page 35: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

35 K-mean Alignment:

Theoretical Choices

Similarity Index between Curves Group of Warping Functions

Properties Minimal

Joint

Properties

Page 36: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

36 K-mean Alignment Performances

Page 37: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

37 One-mean Alignment

Page 38: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

38 One-mean Alignment: aneurysm location

on registered ICA radius and curvature profiles

Unregistered Radius and Curvature Profiles Registered Radius and Curvature Profiles

Page 39: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

39 One-mean Alignment

Page 40: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

40 Two-mean Alignment

Page 41: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

41 Two-mean Alignment vs Two-mean Clustering

Tw

o-m

ean

Ali

gn

men

t T

wo

-mean

Clu

ste

rin

g

Page 42: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

42 Clustering

Two-mean Alignment Clusters that are

morphologically different

30 S-shaped ICAs

vs

35 Ω-shaped ICAs

Krayenbuehl et al. (1982)

No

Aneurysm

Aneurysm

along ICA

Aneurysm

downstream ICA

S-shaped ICAs 100% 52% 30%

Ω-shaped ICAs 0% 48% 70%

P-value of Pearson’s Chi-squared test for independence equal to 0.0013

Fluid-dynamical interpretation of the onset of cerebral aneurysms

Page 43: Functional Data Analysis and Cluster Analysis: a Marriage ...pro1.unibz.it/projects/Clustering_Methods_2014/Vantini.pdf · along ICA Aneurysm downstream ICA S-shaped ICAs 100% 52%

[email protected]

43 References

• Ramsay, J. O., Silverman, B. W. (2005):

Functional Data Analysis, 2nd edition, Springer New York NY .

• Sangalli, L. M., Secchi, P., Vantini, S., Vitelli, V. (2010):

“K-mean Alignment for Curve Clustering”,

Computational Statistics and Data Analysis, Vol. 54., pp. 1219-1233

• Vantini, S. (2013):

“On the Definition of Phase and Amplitude Variability in Functional Data Analysis”,

Test, Vol. 21(4), pp. 676-696.

• Sangalli, L. M., Secchi, P., Vantini, S. (2014):

“Analysis of AneuRisk65 data: k-mean alignment” [with discussion and rejoinder],

Electronic Journal of Statistics, 8, 1891–1904, Special Section on Statistics of Time

Warpings and Phase Variations.

• Krayenbuehl, H., Huber, P., Yasargil, M. G. (1982),

Krayenbuhl/Yasargil Cerebral Angiography, Thieme Medical Publishers, 2nd ed.

• AneuRisk65 data are freely downloadable at http://mox.polimi.it/it/progetti/aneurisk/

http://ecm2.mathcs.emory.edu/aneuriskweb/a65

• Parodi, A., Patriarca, M., Sangalli, L. M., Secchi, P., Vantini, S., Vitelli, V. (2014): fdakma: Clustering and alignment of a given set of curves,

R package version 1.1.1.