SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance &...

21
SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 1/00 Combined Distance and Feature- Combined Distance and Feature- Based Based Clustering of Time-Series: Clustering of Time-Series: An Application on An Application on Neurophysiolohy Neurophysiolohy George Potamias Institute of Computer Science FORTH Heraklion, Crete SETN 2002 SETN 2002 April 10-12 2002 Thessaloniki, Greece

Transcript of SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance &...

Page 1: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 1/00

Combined Distance and Feature-Combined Distance and Feature-Based Based

Clustering of Time-Series:Clustering of Time-Series:An Application on An Application on NeurophysiolohyNeurophysiolohyGeorge Potamias

Institute of Computer ScienceFORTH

Heraklion, Crete

SETN 2002SETN 2002April 10-12 2002

Thessaloniki, Greece

Page 2: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 2/00

Brain development: Series of events cell proliferation and migration, growth of axons and dendrites, formation of functional connections and synapses, cell death, myelination of axons and refinement of neuronal specificity

Adult brain: Complex network of fibers Brain nuclei functional structures

Knowledge of the underlying mechanisms that govern these complex processes, and the study of histogenesis and neural plasticity during brain development

are critical for the understanding of the function of normal or injured brain.

The Application Domain

Page 3: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 3/00

The late embryonic development of avian brain was selected for this study;

Biosynthetic activity, such as protein synthesis, underlies brain-development events. The history of in vivo protein synthesis activity of specific brain areas could: yield insight on their pattern of maturation reveal relationships between distantly located structures suggest different roles of the topographically organized brain structures in the maturation processes

AvianBrain

Study: The time course of protein-synthesis activity of individual brain areas as a model to correlate critical periods during development

Goal: Extract critical-relationships that govern the normal ontogenic processes

??

Study & Goal

Page 4: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 4/00

The late embryonic development between day 11 (E11) and day 19 (E19) as well as the post-hatching day 1 (P1) was studied

During that time proliferation of neurons has ceased and cell growth, differentiation, migration and death, axon elongation, refinement of connections, and establishment of functional neuronal networks occurs

Biomedical Background

For the determination of biosynthetic activity the in vivo auto-radiographic method of carboxyl labeled L-Leucine was used (an essential amino acid present in most

proteins)

The experimental data concern 30 chick embryos

Page 5: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 5/00

Time-Series Representation

49 brain-areas (nuclei) were identified. Autoradiographic film Image Analysis

Intensities

For each area, the means over all chicks were

recorded

90

110

130

150

170

190

E11 E13 E15 E17 E19 P1

Inte

nsi

t ies

Protein Synthe

sisPattern

s

Days

The final outcome is a set of 49 time-series

in a time-span of 6 time-points(five embryonic days and one post-hatching day)

Page 6: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 6/00

40

90

140

190

240

290

E11 E13 E15 E17 E19 P1

AMAcAdBasCACDLCPCPiDLEFPLaFPLpGCtHVHipIOImcLCLLiMMMldOvPAPLPOMPPPTRPORtSCASLSMSPSPISluSpMTOvTPcTnnBORnDBC

How to get meaning from the mesh ?

How to get indicative developmental patterns ?

The Problem

Page 7: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 7/00

Time-Series discretization

Time-Series discretization

Compute distances (similarities)

Compute distances (similarities)

Method:Discovery of Coherences between Time

Series

Induce underlying/hidden modelsmodels

||||Brain Development Brain Development

HierarchyHierarchy

Induce underlying/hidden modelsmodels

||||Brain Development Brain Development

HierarchyHierarchy

Distance & feature-based Hierarchical Clustering

Distance & feature-based Hierarchical Clustering

40

90

140

190

240

290

E11 E13 E15 E17 E19 P1

AMAcAdBasCACDLCPCPiDLEFPLaFPLpGCtHVHipIOImcLCLLiMMMldOvPAPLPOMPPPTRPORtSCASLSMSPSPISluSpMTOvTPcTnnBORnDBC

Time Seriescollection

… need for hierarchical

hierarchical modeling

Visualize – Interpretclustering result(s)

Visualize – Interpretclustering result(s)

Page 8: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 8/00

Need for an adjustable and adaptive time-series matching operation Ignore small or not-significant partsnot-significant parts

Translate the offset align vertically Amplitude scaling fixed width

Time-Series Matching:

Problems & Tasks

… apply matching metric

Use of a normal distance metric … outliers; different scaling factors and baselines ?

Page 9: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 9/00

Achieves- in a convenient way, amplitude scaling, vertical-alignment and identification of (non) significant parts.

Time-Series Discretization

v2 v2 v2 v2 v4 v1 v3

……

……

v1: drastic-increase v2: increase v3: decrease v4: drastic-decrease

44 intervals =

44 nominal valuesQDTQDT: : Qualitative Discrete TransformationA new continuous value will be assigned to the same discrete valuediscrete value as its preceding values if the continuous value belongs to the same population (based on statistical-significance testing).

… the number of discrete-intervals to be specified by the user

Lopez et.al., 2000

Page 10: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 10/00

Discretization specifics

For a time-series T: {X1, X2, …, Xn}

s: number of discrete values

width = s

XminXmax tt }{}{

otherwise

}Xtmax{Xi if

1]})/min{X[X

s

xti wvi = discr(Xi) =

Discrete Transform of T T’: {v1, v2, …, vm}

Page 11: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 11/00

Distance Metric

n

),v(vn

1i

b,ia,i

distance dist(Ta,Tb) = dist(T’a,T’b) =

otherwise0

v v if 1 ba

distance(va;i , vb;i) =DTWSegmentation

Page 12: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 12/00

Graph Theoretic Hierarchical Clustering:

The Basics

Iterative Iterative PartitioningPartitioning… which sub-group to formform?… when to stopstop?

Time-Series NodesNodes

TS distance weighted EdgeEdge

dist(Ta,Tb)

Fully connected weighted weighted GraphGraph

Minimum Spanning Minimum Spanning TreeTree preserves the minimum distance between time-series offers the ability to ‘isolate’ and group nodes

STOP

STOP

STOPHierachical ClusteringHierachical Clustering

Category UtilityCategory Utility: A probabilistic metric

Page 13: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 13/00

Category

Utility

g

i j Vij)p(Aii j Vij/Gk)p(Aip(Gk)Gg)G2,...,CU(G1,

g1k

22

Distribution of Feature-Values … if CLUSTEREDCLUSTERED

Distribution of Feature-Values… if NOT-clusteredNOT-clustered

Over ALL formed clusters

# formed clusters

Page 14: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 14/00

Stopping

Criterion

G11G11G12G12 G21G21 G22G22

CU(G11,G12) CU(G11,G12) >> CU(G21,G22) CU(G21,G22)

G111G111 G112G112

Current BestCurrent Best CU(G111,G112,G12)

<< Previous Best CU(G11,G12)

STOP

Current BestCurrent Best CU(G121,G122,G11)

>>Previous BestPrevious Best CU(G11,G12)

continue

G11G11G12G12

BestBest Partitioning

G122G122

G121G121

Page 15: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 15/00

GTC - Graph Theoretic Clustering:

The Procedure

~O(n2 F V)(preliminary)

……

……

STOP

STOP

HierarchicalHierarchicalClustering-TreeClustering-Tree

Page 16: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 16/00

AM, Ad, Bas, Cpi, DM, GCt, HV, Hip, Co, POM, SL, Tn, Lli, PP, Imc, SCA

16 c3

Ac, CDL, DL, FPLp, GLv, IO, MM, N, NI, OcM, Ov, Rt, SM, Slu, Tov, nBOR, Loc, PA, PM, RPO

20 c2

CA, CP, E, FPLa, LC, LPO, Mld ,PL, PT, SP, Spi, TPc ,VeM13 c1

Brain Nuclei (areas)# ObjectsCluster

The biosynthetic activities of each cluster’s brain-areas- over the stamped developmental ages, exhibit nono statistical-significant deviation from the respective meanmean of the cluster

Patterning Brain Developmental

Events:The Clusters

So, the meanmean of each cluster offers an indicative and representative model for the brain-developmental events … induction of critical relationshipsinduction of critical relationships between the brain areas

Page 17: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 17/00

E11 E13 E15 E17 E19 P1

c1c3c2

C1: DecreaseDecrease – – IncreaseIncrease

C2: DecreaseDecrease C3: IncreaseIncrease

Patterning Brain Developmental

Events:The Patterns

Page 18: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 18/00

E11 E13 E15 E17 E19 P1

c1c3c2

Patterning Brain Developmental Events:Hierarchical-Tree Critical

Relationships

c2

c3

c1

c1 c2

c3late late

maturation

earlyearly maturatio

n

earlyearly maturationor, controlcontrol

Page 19: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 19/00

E11 E13 E15 E17 E19 P1

c1c3c2

Patterning Brain Developmental Events:

Biomedical Interpretation

Clusters {c1c1} {c2c2} Second order sensorysensory and limbic limbic areas Decline in protein-synthesis cell death or cell

displacement due to migrationmigration represent a common phenomenon in many brain regions under development

Differ significantly at post-hatching day {c1c1}: receive sensory-input increase {c2c2}: leucine-incorporation is decreased

Cluster {c3c3} SomatosensorySomatosensory, motormotor, and white-matterwhite-matter areas Increase in protein-synthesis myelination myelination and motor-activitymotor-activity

Page 20: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 20/00

Conclusion & Future work

The introduced time-series mining methodology (QDTQDT/GTCGTC), and the respective analysis on the history of in vivo protein synthesis activity of specific brain areas, yields insight on their maturation patterns and reveal relationships between distantly located structures

The presented study contribute to the identification of common origin of brain structures and provide possible homologies in the mammalian brain

Inclusion of additional formulas and procedures for computing the distance between time-series

Experimentation on other application domains in order to validate the approach and examine its scalability to huge collections of time-series (initial experiments on economic time-series are already in progress with encouraging preliminary results)

Page 21: SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTH Combined Distance & Feature-Based Clustering of Time-Series: An Application.

SETN 2002, April 11 2002, Thessaloniki, Greece -- George Potamias, ICS/FORTHCombined Distance & Feature-Based Clustering of Time-Series: An Application on Neurophysiology 21/00

GTC on ASLAustralian Sign Language

dataset

A subset of the dataset for words:“spend”, “lose”, “forget”, “innocent”, “norway”, “happy”, “later”, “eat”, “cold”, “crazy”

Keogh and Pazzani, 19993rd Conf. on Principles & Practice of Knowledge Discovery in Databases

“one vs. another”

.

.

....

word-1 word-2

2Euclidean

22DTW

21SDTW

2525QDT/GTC

out of 45