DATA ANALYSIS TOOLS FOR IDENTIFYING AND …

DATA ANALYSIS TOOLS FOR

IDENTIFYING AND CHARACTERIZING

DYNAMICAL TRANSITIONS

Cristina Masoller Universitat Politecnica de Catalunya

www.fisica.edu.uy/~cris

Extremes 2018

Hanover, March 2018

• Analysis tools

‒ univariate

‒ bivariate

• Examples

‒ lasers

‒ climate

Diode laser with time-delayed

optical feedback

Lasers provide “big data” for testing analysis tools

Extreme pulses

(diode laser with injection,

Thursday 11:30)

Polarization switching

VCSEL Transition to optical turbulence

Fiber laser

Time

Time

Time Time

Video: how complex optical signals

emerge from noisy fluctuations

lase

r cu

rre

nt

In complex systems dynamical transitions

are difficult to identify and to characterize.

Example: laser with time delayed optical feedback

Time

Laser intensity

https://youtu.be/nltBQG_IIWQ

https://youtu.be/nltBQG_IIWQ

Can differences be quantified? With what reliability?

Time

Laser output intensity

Low current (noise?)

High current (chaos?)

Are weather extremes becoming more frequent?

more extreme?

Credit: Richard Williams, North Wales, UK

Physics Today, Sep. 2017

ECMWF

Strong need of data-driven reliable

analysis tools

Many methods

‒ Correlation analysis

‒ Fourier analysis

‒ Lyapunov & fractal analysis

‒ Symbolic analysis

‒ Wavelet analysis

‒ Etc. etc.

Different methods provide complementary information

The method to be used

depends on the data

− Length

− Noise

− Resolution

− Etc.

Univariate time-series analysis tools

Optical spikes Neuronal spikes

Time

How similar these time series are?

Time

Threshold crossings define ``events’’ in a time series

Ti = ti+1 - ti inter-spike-intervals (ISIs):

Problems:

‒ How to select

the threshold?

‒ Threshold

dependent

results?

A. Longtin et al PRL (1991)

Data recorded in our lab

when a sinusoidal signal is

applied to the laser current.

Neuron data

ISI distribution indicates that neurons and lasers have

a similar response to external periodic forcing

Single auditory nerve fiber

of a squirrel monkey with a

sinusoidal sound stimulus

applied at the ear.

2T0 4T0

Laser data

A. Aragoneses et al

Optics Express (2014)

http://www.opticsinfobase.org/oe/viewmedia.cfm?URI=oe-22-4-4705&seq=0&origin=search


















A. Longtin

Int. J. Bif. Chaos (1993)

Laser ISIs Neuronal ISIs

M. Giudici et al PRE (1997)

A. Aragoneses et al

Optics Express (2014)

Return maps also suggest that neurons and lasers have

similar response to external periodic forcing

Ti

Ti+1

















ISI correlations uncover memory in neuron’s firing activity

Neuron 1 Neuron 2

{…Ti- … Ti …}

Lag

Ti

Ti+1

C

Also positive

and

alternating

ISI

correlations

have been

reported.

In the laser data, ISI correlations uncover transitions

Experiment 2: no

significant correlations

Laser current

C1

C2

Laser current

Experiment 1 (two datasets)

J. Tiana et al PRA (2010)

A. Aragoneses et al Sci. Rep. (2014)

• How to identify temporal order in the laser spikes?

• Are there more or less expressed spike patterns?

link.aps.org/pdf/10.1103/PhysRevA.82.013819














https://www.nature.com/articles/srep04696.pdf


















Symbolic ordinal analysis

Ordinal rule: if xi > xi-1 si = 0; else si =1

Sequence of 0s and 1s defined without using a threshold

D! symbols (ordinal patterns) are defined from the

temporal order of D values. D = 3: {…xi, xi+1, xi+2, …}

Ordinal analysis transforms a time series into “symbols”

that contain information of temporal correlations

Bandt and Pompe PRL (2002)

210 012

Drawback: information about actual values is lost.

How to quantify the information content of the symbolic

sequence?

Permutation entropy: iip pps log

Example: Logistic map

1 2 3 4 5 60

50

100

150

200

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

550 555 560 565 570 575 580 585 590 595 6000

0.2

0.4

0.6

0.8

1

Time series Detail

Histogram D=3 Patterns Histogram x(t)

forbidden

Ordinal analysis yields information about more expressed

and less expressed oscillation patterns in the data.

)](1)[( )1( ixixrix

3.5 3.55 3.6 3.65 3.7 3.75 3.8 3.85 3.9 3.950

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Normal bifurcation diagram Ordinal bifurcation diagram

Ordinal analysis yields complementary information

Xi

Map parameter Map parameter, r

Pattern 6 (210) is always forbidden;

pattern 1 (012) is more frequently

expressed as r increases

Ordinal

Probabilities

Example of application:

How optical chaos emerges from noise?

stochastic dynamics at low current,

signatures of “determinism” at higher current.

As the laser current increases,

temporal correlations emerge

J. Tiana et al PRA (2010); N. Rubido et al, PRE (2011)

Time

Laser intensity

C1

C2

Laser current (mA)

P(10)

P(01) lase

r cu

rre

nt















https://journals.aps.org/pre/abstract/10.1103/PhysRevE.84.026202














Ordinal analysis identifies the onset of different dynamical

regimes, but does not distinguish “noise” and “chaos”

C. Quintero-Quiroz et al, Scientific Reports (2016)

Grey region: probabilities are consistent with the uniform

distribution (Pi = 1/6 0.17 i) with 99.7% confidence level

laser current

0.14

0.17

0.24






P(210) identifies dynamical regimes in parameter space

(pump current, feedback strength)

M. Panozzo et al, Chaos (2017)

P(210) Intensity pdf

0.17

0.19

0.15

Zooming into the region where spikes are well-defined, a

transition is detected (not captured by correlation analysis)

Laser current

Laser current

A. Aragoneses et al

Scientific Reports (2014)

Pi


















DK

iciii )4sin()2sin(2

1

Circle map data

A modified circle map: simple minimal model

iiiX 1

Same “clusters” & same hierarchical structure

= natural frequency

forcing frequency

K = forcing amplitude

D = noise strength

A. Aragoneses et al Scientific Reports (2014)

Empirical laser data

Laser current

0.17

0.19

0.15

Map parameter c

0.5

0.25

0

Pi

















0

laser data

Modulation amplitude 4% 0.1

0.2

DK

iciii )4sin()2sin(2

1

The modified map describes spike correlations in sensory

neurons (Neiman and Russell, PRE 2005)

Can we test its validity as a minimal model for the laser spikes?

Connection with neurons: the circle map

describes many excitable systems

Map parameter K

0.1

0.2

Pi

Ordinal probabilities uncover the regions of noisy locking

T. Sorrentino et al, JSTQE (2015)

Time

2:1

3:1

T 2 T mod

4:1

P(10)

P(3210)

0.8 % 1.6 %

Modulation frequency

Laser

current

http://www.fisica.edu.uy/~cris/Pub/JSTQE_2015.pdf







FHN model with Gaussian

white noise and weak

sinusoidal input:

spikes are noise-induced

Comparing with synthetic neuronal spikes: good agreement

Modulation amplitude

Empirical laser data

Modulation amplitude

Synthetic spikes

Aparicio-Reinoso, Torrent and Masoller, PRE (2016)

Pi

http://journals.aps.org/pre/abstract/10.1103/PhysRevE.94.032218








Transition to optical chaos: ordinal analysis distinguishes

different regimes.

Spike patterns that are more/less expressed are not always

detected by correlation analysis.

Minimal model identified.

Good agreement between optical & neuron (synthetic) spikes.

Open question: why the ordinal probabilities are “clustered”?

What did we learn?

Mapping a time series into a

network

A graph: a set of “nodes” connected by a set of “links”

What is a network?

The number of patterns increases as D!

a drawback and an advantage

Is a problem for short datasets

But is an opportunity to turn a time

series into a network by using the

patterns as “nodes” of the network.

And the links? Defined as the transition probability →

Adapted from M. Small (The University of Western Australia)

In each node i:

j wij=1

Weigh of node i: the

probability of pattern i

(i pi=1)

Weighted and

directed network

Network-based diagnostic tools

• Entropy computed from node weights (permutation entropy)

• Average node entropy (entropy of the link weights)

• Asymmetry coefficient: normalized difference of transition

probabilities, P(‘01’→ ‘10’) - P(‘10’→ ’01’), etc.

iip pps log

(0 in a fully symmetric network;

1 in a fully directed network)

A first test with

synthetic data

D=4

Detects the merging

of four branches, not

detected by the

Lyapunov exponent.

C. Masoller et al, NJP (2015)

Sp = PE

Sn=S(TPs)

Lyapunov

exponent

Map parameter

Slinks

ac

http://iopscience.iop.org/1367-2630/17/2/023068/pdf/1367-2630_17_2_023068.pdf






Apply the ordinal network method to

laser (VCSEL) empirical data

Two sets of experiments: intensity time series were recorded

‒ keeping constant the laser current.

‒ while increasing the laser current.

We analyzed the polarization that turns on / turns off.

Is it possible to anticipate the switching?

No if the switching is fully stochastic.

As the laser current increases

Time

Intensity @ constant current

Time

Early warning

Deterministic mechanisms

must be involved.

First set of experiments (the current is kept constant):

despite of the stochasticity of the time-series, the node

entropy “anticipates” the switching


Laser current

I

Laser current

I

Laser current

Node

entropy

sn

(D=3)

No

warning

L=1000

100 windows







The warning is robust with respect to the length

of the pattern D and the length of the window L

Node

entropy

5000 1000 L=500

D=3

Laser current

Laser current

L=1000

D=2 D=3 D=4

Node

entropy








In the second set of experiments (current increases

linearly in time): an early warning is also detected

Node

entropy

Time

With slightly

different

experimental

conditions: no

switching.


L=500, D=3

1000 time series

Time







Another way to represent a time series as a

network: the horizontal visibility graph (HVG)

Luque et al PRE (2009); Gomez Ravetti et al, PLOS one (2014)

Unweighted and undirected graph

Rule: data points i and j are connected if there is “visibility”

between them

i

Xi

Example: uncovering temporal correlations in the

intensity of a fiber laser

How to characterize the graph?

Low → High pump power

“Laminar” → “Turbulence”

The degree distribution:

a simple way to characterize a graph

Strogatz, Nature 2001

Information-theory measures computed from the

degree distribution allow inferring the synthetic model

that most closely represents the laser empirical data

Carpi and Masoller, PRA (2018)

Degree k

P(k)

1.5 W 0.8 W 0.9 W

Gaussian

white

noise

HVG Entropy

Fischer information

1 W

1.5 W

Hurst exp.

https://journals.aps.org/pra/pdf/10.1103/PhysRevA.97.023842



Different ways of calculating the entropy S uncover

different features of the Laminar → Turbulence transition

Time

Aragoneses et al, PRL (2016)

“Raw” data

Surrogate

HVG or PE

“Thresholded” data

(the abrupt transition is robust with

respect to the selection of the threshold)

HVG

PE

S

2

S

https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.033902





The “usual” entropy (from the pdf of raw intensity values)

uncovers another feature of the transition


I(t)

I=0

=1

0.8 W

1.0 W

0.9 W

0.95 W

Time

S(“raw” pdf)






Changing the sampling time identifies ``hidden’’ time

scales in the dynamics, undetected by correlation analysis

Below transition

at transition

above transition







The space-time representation of the intensity time series:

a convenient way to visualize the dynamics

Color

scale: Ii

III

III

III

...

...

...

............

21

221

32212{I1, I2, … I , I+1 ,…}

n

→

=396 dt 431dt

n

496dt







Transforming a time series (laser intensity) into a network

yields new insights into dynamical transitions.

“Early warning” of abrupt switching identified.

Transition seen in raw data different from thresholded data.

Ordinal analysis uncovers ``hidden’’ characteristic time scales.

The space-time representation reveals “spatial” structures.

What did we learn?

How to extract instantaneous phase,

amplitude and frequency information

Are weather extremes becoming more frequent?

more extreme?

x

HT[x]

x

y=HT[x]

The Hilbert transform allows to define an instantaneous

amplitude and phase for each data point in a time series

Surface air temperature (SAT)

HT[sin(t)]=cos(t)

A word of warning: only if x(t) is a narrow-band signal a(t)

and (t) = d/dt have clear physical meaning

‒ a(t) is the envelope of x(t)

‒ (t) is the main frequency in the Fourier spectrum

Problem: climate time series are not narrow-band

Usual solution (e.g. brain signals): isolate a narrow

frequency band

However, HT directly applied to surface air temperature

uncovers the “hot spots” that are most affected by CC.

Can we use the Hilbert amplitude, phase, frequency,

to identify and quantify regional climate change?

The data: surface air temperature (SAT)

Spatial resolution 2.50 x 2.50 10226 time series

Daily resolution 1979 – 2016 13700 data points

Where does the data come from?

European Centre for Medium-Range Weather Forecasts

(ECMWF, ERA-Interim).

Freely available.

Reanalysis = general atmospheric circulation model feed with

empirical data, where and when available (data assimilation).

Features extracted from each SAT time series

Time averaged amplitude, a

Time averaged frequency,

Standard deviations, a,

White = 0.017 rad/day

= one cycle per year

The map of time average frequency uncovers

regions of fast frequency dynamics

Zappala, Barreiro and Masoller, Entropy (2016)

0.2

-0.2

o

3

-3

3

-2

□

http://www.mdpi.com/1099-4300/18/11/408/pdf



Large frequency fluctuations due to strong precipitation

Standard deviation of

frequency fluctuations,

Annual mean precipitation

Zappala, Barreiro and Masoller, Entropy (2016)




Phase dynamics: temporal evolution of the cosine

of the Hilbert phase

Typical year El Niño year La Niña year

http://www.fisica.edu.uy/~cris/videos/map_typical.mp4

http://www.fisica.edu.uy/~cris/videos/map_ElNino.mp4

http://www.fisica.edu.uy/~cris/videos/map_LaNinal.mp4

SAT → average in a time window → Hilbert

Influence of filtering the data: high frequencies removed

No filter 1 month 3 months

Relative decadal variations

Relative variation is considered significant if:

1979198820072016 aaa

19792016

a

a

ssa

a2.

ssa

a2.

or

100 surrogates

Relative variation of average Hilbert amplitude uncovers

regions where the seasonal cycle increased/decreased

Decrease of precipitation: the solar radiation that is not

used for evaporation is used to heat the ground.

Melting of sea ice: during winter the air temperature is

mitigated by the sea and tends to be more moderated.

o

Relative change of time-averaged Hilbert frequency

consistent with a north shift and enlargement of the ITCZ

First ten years

Last ten years

Frequency variations are consistent with variations

in the number of zero-crossings

In blue areas: average frequency decreases

In red areas: average frequency increases

Hilbert analysis applied to raw SAT data yields insight into

regional climate changes

Large variations of Hilbert amplitude interpreted as due to ice

melting (Arctic) or precipitation decrease (in Amazonia)

Large variations of Hilbert frequency interpreted as due to a

shift and enlargement of the ITCZ.

Summary

Bivariate data analysis tools:

“correlation / functional networks”

Functional brain network

Eguiluz et al, PRL 2005

“Functional” climate network

Donges et al,

Chaos 2015

Anomalies

(annual

solar cycle

removed)

Climate system

interpretation

(currents,

winds, etc.)

More than

10000

nodes

Similarity measure

+ threshold

Brain network Climate network

Weighted

degree

Graphical representations

How to select the threshold?

Increasing the threshold decreases the network connectivity

Barreiro, Marti and Masoller, Chaos (2011)

http://www.fisica.edu.uy/~cris/Pub/chaos_2011.pdf






Similarity measures between two time series X, Y:

-1 X,Y 1

X,Y = Y,X

The maximum of

X,Y() indicates the

lag that renders X

and Y best aligned.

p(x,y) = p(x) p(y) MI = 0,

else MI >0

MI (x,y) = MI (y,x)

MI can also be computed

with a lag.

MI can also be computed

from ordinal probabilities.

Ordinal patterns are useful for climate data analysis:

allow selecting the time scale of the analysis

Example: el Niño index, monthly sampled

− Consecutive months (green):

(intra-season time-scale)

− Consecutive years (red):

− Inter-annual (blue)

)...]2( ),1( ),([... txtxtx iii

)...]24( ),...12( ),...([... txtxtx iii

Mutual information computed from ordinal probabilities

separates the times scales of the interactions

MI values of a reference point located in El Niño area

Pdf of data

values

3 months

Inter-

annual

3 years

Deza, Barreiro and Masoller EPJST (2013)

http://www.fisica.edu.uy/~cris/Pub/epjst_deza_2013.pdf














The directionality index yields information about

the net direction of information transfer

=30 days

MI and D are

both

significant

(>3,

bootstrap

surrogates)

Deza, Barreiro and Masoller, Chaos 25, 033105 (2015)

A. Bahraminasab et al, PRL 100, 084101 (2008)

Ixy(): conditional MI





The lag quantifies the time-scale of information transfer

1 day 3 days

7 days

30 days

Video: directionality in reference point along the equator

http://www.fisica.edu.uy/~cris/videos/Directional.mp4













Nearby nodes have the

strongest links.

There are many methods to

infer bi-variate interactions

from observations.

Can we test them?

How to detect weak-but-significant links?

Main problem: the spatial embedding of the climate network

Maybe yes, by using a “toy model” where we know the real

connectivity (“the ground truth”).

Sensitivity (also called the

true positive rate):

proportion of existing links

that are correctly identified.

Specificity S (also called

the true negative rate):

proportion of non-existing

links that are correctly

identified.

False positive rate (“false

alarms”) proportion of non-

existing links that are

incorrectly identified.

Definitions

Source: wikipedia

Receiver operating characteristic (ROC curve)

Similarity measure 1



Source: wikipedia

Kuramoto oscillators in a random network

Phases () CC MI MIOP

Aij is a symmetric;

N=12 time-series,

each 104 data points.

“Observable” Y=sin()

True positives False positives True positives False positives

Results of a 100 simulations with different oscillators’ frequencies, random

matrices, noise realizations and initial conditions.

For each K, the threshold was varied to obtain optimal reconstruction.

With the instantaneous frequencies (d/dt) perfect

network inference is possible!

CC MI MIOP

BUT

• the number of oscillators is small (12),

• the coupling is symmetric ( only 66 possible links) and

• the data sets are long (104 points)

G. Tirabassi et al, Sci. Rep. 5 10829 (2015)








We also analyzed experimental data recorded from 12 chaotic

Rössler electronic oscillators (symmetric and random coupling)

The Hilbert Transform

was used to obtain

phases from

experimental data

G. Tirabassi et al, Sci. Rep. 5 10829 (2015)









Results obtained with experimental data

Masoller

Observed

variable (x)

Hilbert phase

Hilbert frequency

CC MI MIOP

‒ No perfect

reconstruction

‒ No important

difference

among the 3

methods & 3

variables

Transitions in complex systems often

lead to structural changes.

How to identify them?

Vegetation transition under the lens of network analysis

G. Tirabassi et al., Ecological

Complexity (2014)

Rainfall R

Biomass

B

http://www.fisica.edu.uy/~cris/Pub/Ecological_2014.pdf





‘‘Randomization’’ of the correlation network when the

tipping point is approached

clustering

assortativity

skewness kurtosis

The ‘‘Gaussianisation’’ of the distributions is

quantified by the Kullback–Leibler Distance

G. Tirabassi et al., Ecological Complexity 19, 148 (2014)

Open issue: the

“Gaussianisation”

might be a model-

specific feature.






Degree, centrality, assortativity distributions etc. provide

partial information.

How to define a measure that contains detailed

information about the global topology of a network, in a

compact way?

Node Distance Distributions (NDDs)

pi(j) of node “i“ is the fraction of nodes that are connected

to node i at distance j

If a network has N nodes:

NDDs = vector of N pdfs {p1, p2, …, pN}

If two networks have the same set of NDDs they have

the same diameter, average path length, etc.

In order to detect structural differences we need

a precise measure to compare networks

The Network Node Dispersion (NND) measures the

heterogeneity of the N pdfs {p1, p2, …, pN}

Quantifies the heterogeneity of connectivity distances.

How to condense the information contained in the

node distance distributions?

d = diameter

Example of application: in a random network the NDD

detects the percolation transition

Log(PN) P=connection probability

T. A. Schieber et al, Nat. Comm. (2017)

http://www.nature.com/articles/ncomms13928.pdf








Extensive numerical experiments demonstrate that

isomorphic graphs return D=0.

Computationally efficient.

Dissimilarity between two networks

w1=w2=0.5

compares the

averaged

connectivity

compares the

heterogeneity of the

connectivity distances

Comparing real networks to null models

DS

preserves

the degree

sequence;

2.0 also

preserves

the degree

correlation;

2.1 also the

clustering

coefficient;

2.5 also the

clustering

spectrum

Synthetic model for the Power Grid Network?

HVG

T. A. Schieber et al, Nat. Comm. (2017)

fBm Hurst=0.14









EEG data

‒ https://archive.ics.uci.edu/ml/datasets/eeg+database

‒ 64 electrodes placed on the subject’s scalp sampled at 256

Hz during 1s

‒ 107 subjects: 39 control and 68 alcoholic

Use HVG to transform each EEG TS into a network G.

Weight between two brain regions: 1-D(G,G’)

The resulting network represents the weighted similarity

between the brain regions of an individual.

We can compare the different individuals.

Another application: comparing brain networks

Hamming distance Dissimilarity measure

T. A. Schieber et al, Nat. Comm. 8, 13928 (2017)

Two brain regions are identified (‘nd’ and ‘y’): the weights of

the links are higher in control than in alcoholic subjects







Concluding

Symbolic analysis, information theory and complex

networks are useful tools for investigating complex signals.

Different techniques provide complementary information.

Take home messages

“…nonlinear time-series analysis has been used to great

advantage on thousands of real and synthetic data sets

from a wide variety of systems ranging from roulette wheels

to lasers to the human heart. Even in cases where the data

do not meet the mathematical or algorithmic requirements,

the results of nonlinear time-series analysis can be helpful

in understanding, characterizing, and predicting dynamical

systems…” Bradley and Kantz, CHAOS 25, 097610 (2015)

References J. Tiana et al PRA 82, 013819 (2010)

N. Rubido et al, PRE 84, 026202 (2011)

Barreiro, Marti and Masoller, Chaos 21, 013101 (2011)

Deza, Barreiro and Masoller, Eur. Phys. J. ST 222, 511 (2013)

A. Aragoneses et al Optics Express 22, 4705 (2014)

A. Aragoneses et al Sci. Rep. 4, 4696 (2014)

G. Tirabassi et al., Ecological Complexity 19, 148 (2014)

C. Masoller et al, NJP 17, 023068 (2015)

Tirabassi et al, Sci. Rep. 5 10829 (2015)

Deza, Barreiro and Masoller, Chaos 25, 033105 (2015)

Aragoneses et al, PRL 116, 033902 (2016)

C. Quintero-Quiroz et al, Sci. Rep. 6, 37510 (2016)

Zappala, Barreiro and Masoller, Entropy 18, 408 (2016)

T. A. Schieber et al, Nat. Comm. 8, 13928 (2017)

M. Panozzo et al, Chaos 27, 114315 (2017)

Carpi and Masoller, PRA 97, 023842 (2018)

Zappala, Barreiro and Masoller, submitted (2018)


























































































http://aip.scitation.org/doi/abs/10.1063/1.4986441?ai=1gvoi&mi=3ricys&af=R








https://www.earth-syst-dynam-discuss.net/esd-2017-79/




UPC team and funding

Maria Masoliver, Pepe Aparicio Reinoso (neurons)

Taciano Sorrentino, Carlos Quintero, Jordi Tiana, Came

Torrent (laser lab)

Andres Aragoneses, Laura Carpi (fiber laser data)

Ignacio Deza, Giulio Tirabassi, Dario Zappala, Marcelo

Barreiro (climate)

<[email protected]>

http://www.fisica.edu.uy/~cris/Talk/tutorial_hanover_2018.pdf

DATA ANALYSIS TOOLS FOR IDENTIFYING AND …

Documents

Transcript of DATA ANALYSIS TOOLS FOR IDENTIFYING AND …