Bio-Fingerprinting applied to polysomnographs

73
Bio-Fingerprinting applied to polysomnographs Student Lucchini Marta Supervisor Faraci Francesca Dalia Correlator Fiorillo Luigi Customer Faraci Francesca Dalia Course Computer Engineering Module M00009 Progetto di diploma Year 2018/2019 Date September 10, 2019

Transcript of Bio-Fingerprinting applied to polysomnographs

Page 1: Bio-Fingerprinting applied to polysomnographs

Bio-Fingerprinting applied topolysomnographs

Student

Lucchini Marta

Supervisor

Faraci Francesca Dalia

Correlator

Fiorillo Luigi

Customer

Faraci Francesca Dalia

Course

Computer Engineering

Module

M00009 Progetto di diploma

Year

2018/2019

Date

September 10, 2019

Page 2: Bio-Fingerprinting applied to polysomnographs
Page 3: Bio-Fingerprinting applied to polysomnographs

i

Contents

Abstract 3

1 Assigned project 5

1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Changes in progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Introduction 9

3 Used technologies 13

4 Algorithm 15

4.1 A7 algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 YASA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Detection of the spindles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.1 Threshold 1: Relative power in the sigma band . . . . . . . . . . . . . 17

4.3.2 Threshold 2: Moving correlation . . . . . . . . . . . . . . . . . . . . . 17

4.3.3 Threshold 3: Moving RMS . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.4 Decision function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.5 Additional information . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Preliminary analysis 19

6 Analysis taking the rms peak of every spindle 29

6.1 Bar-plots for rms peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2 Scatter-plots for rms peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.3 Range computed with logarithmic functions . . . . . . . . . . . . . . . . . . . 34

7 Analysis to compute personalized thresholds 37

7.1 Bar-plots for rms inflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Bio-Fingerprinting applied to polysomnographs

Page 4: Bio-Fingerprinting applied to polysomnographs

ii CONTENTS

7.2 Scatter-plots for rms inflections . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.3 Range computed with mathematical functions . . . . . . . . . . . . . . . . . . 42

8 Analysis on some rms inflections during the night 47

9 Confidence interval on the mean to compute the thresholds 51

9.1 F1-score as a measure of performance . . . . . . . . . . . . . . . . . . . . . 51

9.2 Personalize the algorithm knowing only some spindles for patient . . . . . . . 53

10 Preliminary analysis for adaptive method to compute thresholds 55

10.1 Exploration of the relationship between parameters . . . . . . . . . . . . . . . 55

10.2 Polynomial regression on spindle inflections . . . . . . . . . . . . . . . . . . . 56

11 Adaptive method to compute thresholds 59

12 Conclusions 63

Bio-Fingerprinting applied to polysomnographs

Page 5: Bio-Fingerprinting applied to polysomnographs

iii

List of Figures

1.1 Assigned project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5.1 EEG data and relative power . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2 EEG data and correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3 EEG data and rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4 Parameters in spindle with rms peak of 22.90 . . . . . . . . . . . . . . . . . . 21

5.5 Parameters in spindle with rms peak of 12.17 . . . . . . . . . . . . . . . . . . 22

5.6 Parameters in spindle with rms peak of 9.40 . . . . . . . . . . . . . . . . . . . 22

5.7 Parameters in spindle with rms peak of 11.57 . . . . . . . . . . . . . . . . . . 22

5.8 Parameters in spindle with rms peak of 9.50 . . . . . . . . . . . . . . . . . . . 23

5.9 Parameters in spindle with rms peak of 10.44 . . . . . . . . . . . . . . . . . . 23

5.10 Parameters in EEG with no spindle . . . . . . . . . . . . . . . . . . . . . . . . 24

5.11 Parameters in EEG with no spindle . . . . . . . . . . . . . . . . . . . . . . . . 24

5.12 Parameters in EEG with no spindle . . . . . . . . . . . . . . . . . . . . . . . . 24

5.13 Parameters in EEG pre-spindle . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.14 Parameters in EEG pre-spindle . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.15 Parameters in EEG pre-spindle . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.16 Parameters in EEG post-spindle . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.17 Parameters in EEG post-spindle . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.18 Parameters in EEG post-spindle . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.1 Rms peaks: rms bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2 Rms peaks: correlation bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.3 Rms peaks: relative power bar-plot . . . . . . . . . . . . . . . . . . . . . . . . 31

6.4 Scatter plot relative power-correlation of spindle peaks . . . . . . . . . . . . . 32

6.5 Scatter plot relative power-rms of spindle peaks . . . . . . . . . . . . . . . . . 33

6.6 Scatter plot correlation-rms of spindle peaks . . . . . . . . . . . . . . . . . . 33

6.7 Rms peaks: logarithmic relation and range between relative power and cor-

relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.1 Inflection in rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bio-Fingerprinting applied to polysomnographs

Page 6: Bio-Fingerprinting applied to polysomnographs

iv LIST OF FIGURES

7.2 Rms inflections: rms bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.3 Rms inflections: correlation bar-plot . . . . . . . . . . . . . . . . . . . . . . . 39

7.4 Rms inflections: relative power bar-plot . . . . . . . . . . . . . . . . . . . . . 40

7.5 Scatter plot relative power-correlation of spindle inflections . . . . . . . . . . . 41

7.6 Scatter plot relative power-rms of spindle inflections . . . . . . . . . . . . . . 41

7.7 Scatter plot correlation-rms of spindle inflections . . . . . . . . . . . . . . . . 42

7.8 Rms inflections: exponential relation and range between correlation and rms . 43

7.9 Rms inflections: linear relation and range between relative power and corre-

lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.10 Rms inflections: linear relation and range between relative power and rms . . 45

8.1 Scatter plot relative power-correlation of spindle inflections . . . . . . . . . . . 47

8.2 Scatter plot relative power-correlation of no-spindle inflections . . . . . . . . . 48

8.3 Scatter plot relative power-rms of spindle inflections . . . . . . . . . . . . . . 48

8.4 Scatter plot relative power-rms of no-spindle inflections . . . . . . . . . . . . . 49

8.5 Scatter plot correlation-rms of spindle inflections . . . . . . . . . . . . . . . . 49

8.6 Scatter plot correlation-rms of no-spindle inflections . . . . . . . . . . . . . . 50

Bio-Fingerprinting applied to polysomnographs

Page 7: Bio-Fingerprinting applied to polysomnographs

v

List of Tables

5.1 Results initial YASA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.1 Rms peaks: rms bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2 Rms peaks: correlation bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.3 Rms peaks: relative power bar-plot . . . . . . . . . . . . . . . . . . . . . . . . 31

6.4 Rms peaks: results with modified thresholds . . . . . . . . . . . . . . . . . . 32

6.5 Rms peaks: results with logarithmic function . . . . . . . . . . . . . . . . . . 35

7.1 Rms inflections: rms bar-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.2 Rms inflections: correlation bar-plot . . . . . . . . . . . . . . . . . . . . . . . 39

7.3 Rms inflections: relative power bar-plot . . . . . . . . . . . . . . . . . . . . . 40

7.4 Rms peaks: results with modified thresholds . . . . . . . . . . . . . . . . . . 40

7.5 Rms inflections: results with exponential relation between correlation and rms 43

7.6 Rms inflections: results with linear relation between relative power and corre-

lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.7 Rms inflections: results with linear relation between relative power and corre-

lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

9.1 Initial YASA algorithm results with F1-score . . . . . . . . . . . . . . . . . . . 53

9.2 Results using confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . 53

9.3 Results using confidence interval with 10/20 spindles in input . . . . . . . . . 53

10.1 Rms inflections and minimum: R2 in polynomial regression . . . . . . . . . . 56

10.2 Rms inflections and minimum separated: R2 in polynomial regression . . . . . 56

10.3 Total spindle durations: R2 in polynomial regression . . . . . . . . . . . . . . 56

11.1 Relations between the parameters: R2 in regression . . . . . . . . . . . . . . 59

11.2 Adaptive method: correlation fixed . . . . . . . . . . . . . . . . . . . . . . . . 60

11.3 Adaptive method: rms fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

11.4 Adaptive method: rms fixed with upper limit . . . . . . . . . . . . . . . . . . . 61

11.5 Adaptive method: relative power fixed . . . . . . . . . . . . . . . . . . . . . . 61

11.6 Adaptive method: relative power fixed with upper limit . . . . . . . . . . . . . 61

Bio-Fingerprinting applied to polysomnographs

Page 8: Bio-Fingerprinting applied to polysomnographs

vi LIST OF TABLES

Bio-Fingerprinting applied to polysomnographs

Page 9: Bio-Fingerprinting applied to polysomnographs

1

Thanks

To my mother and my father, for their psychological and economic support, which has al-

lowed me to conclude this important journey with serenity.

A special thanks also to my supervisor Francesca Dalia Faraci and my correlator Luigi Fior-

illo for their help and availability, to all my professors for their teachings and to my colleagues

for the great moments spent together in these three years and for their help and support.

Bio-Fingerprinting applied to polysomnographs

Page 10: Bio-Fingerprinting applied to polysomnographs

2 LIST OF TABLES

Bio-Fingerprinting applied to polysomnographs

Page 11: Bio-Fingerprinting applied to polysomnographs

3

Abstract

Polysomnography (PSG) is a multi-parametric test used in the study of sleep and as a diag-

nostic tool for sleep disorders. The first step in the quantitative analysis of polysomnographic

recordings is the classification of sleep stages. It is possible to distinguish between wake,

REM sleep, NREM sleep stages 1 to 4 and movement time.

To classify sleep stages, it is important to identify where certain patterns occur, such as

sleep spindles. A sleep spindle is an electroencephalography (EEG) pattern defined as a

train of distinct waves with frequency 11–16 Hz (most commonly 12-14 Hz) with duration ≥0.5 s. Spindles are a characteristic of stage 2 sleep as they define the transition from stage

N1 (non-rapid eye movement, NREM1) to stage N2 (NREM2).

Sleep stage scoring relies heavily on visual pattern recognition by a human expert and is

time consuming and subjective. Thus, there is a need for automatic classification. Some

automatic detectors already exist, but they are not accurate.

The aim of this project is to demonstrate that the performance of an existent sleep-spindle

detector can improve by modifying the algorithm so that it can be adapted to the character-

istics of each patient.

In this project some analysis has been performed, demonstrating that a detection spindle

algorithm can be made customizable for each patient. For the patient on which I have tested

the best method found, the F1-score has increased from 0.48, result of the initial algorithm,

to 0.55, result using 20 spindles in input.

Bio-Fingerprinting applied to polysomnographs

Page 12: Bio-Fingerprinting applied to polysomnographs

4 Abstract

Bio-Fingerprinting applied to polysomnographs

Page 13: Bio-Fingerprinting applied to polysomnographs

5

Bio-Fingerprinting applied to polysomnographs

Page 14: Bio-Fingerprinting applied to polysomnographs

6 Assigned project

Chapter 1

Assigned project

Figure 1.1: Assigned project

Bio-Fingerprinting applied to polysomnographs

Page 15: Bio-Fingerprinting applied to polysomnographs

7

1.1 Description

Polysomnography is an analysis that is done to investigate possible sleep disorders. Several

biological signals are recorded (EEG-EOG-EMG-ECG). The project consists in developing

a system that performs pattern-recognition of specific EEG signals performed during sleep,

adapting to the patient’s characteristics.

This study project is integrated into the European project E! SPAS (Sleep Physician Assis-

tant System) in collaboration with two European companies and the Inselspital in Bern.

1.2 Tasks

• Brief analysis of the state of the art in relation to PSG pattern recognition

• Development of automatic recognition tools for characteristic sleep signals based both

on existing open source tools and on results available in the specialist literature

• Integration in the algorithm of data related to the single patient (The idea is to per-

sonalize the recognition on the patient’s data, integrating in the algorithm the initial

choices of the doctor / technician that identifies some patterns visually, from which the

algorithm will possibly improve identifications).

• Make the algorithm flexible in the identification of Arousal, Spindles, K-complexes that

are priority in sleep scoring and must be identified with maximum accuracy.

• Test the accuracy of the algorithm on PSGs where all the patterns have been recog-

nized and labeled by expert staff.

1.3 Targets

• Development of knowledge related to polysomnography

• Development of knowledge related to pattern recognition and biofingerprinting

1.4 Technologies

Mainly Python. Some parts can be developed in R

1.5 Changes in progress

After studying the state of the art in relation to PSG pattern recognition, I decided to dedicate

the project to a specific pattern. I choose spindles because they are well documented and I

Bio-Fingerprinting applied to polysomnographs

Page 16: Bio-Fingerprinting applied to polysomnographs

8 Assigned project

found a good spindles detection algorithm as start point. However, all the analysis performed

for the project can be extended to other EEG signals.

Bio-Fingerprinting applied to polysomnographs

Page 17: Bio-Fingerprinting applied to polysomnographs

9

Chapter 2

Introduction

What is sleep? At the behavioral level, sleep can be defined as a reversible behavioral

state of perceptual disengagement from and unresponsiveness to environmental stimuli.

The sleep-wake cycle and the structure of sleep reflect the spontaneous activity of autoreg-

ulatory central nervous system processes. On the base of some physiological parame-

ters, sleep can be divided into two separate states: nonrapid eye movement (NREM) and

rapid eye movement (REM) sleep. These differ from one another as well as from wakeful-

ness. Conventionally, NREM sleep is subdivided into four stages, distinguished from each

other principally on the base of their different patterns of brain electrical activity, as mea-

sured by the electroencephalogram (EEG), which is considered as the core measurement

of polysomnography. Polysomnography (PSG) is a multi-parametric test used in the study

of sleep and as a diagnostic tool in sleep medicine. The test result is called a polysomno-

gram, also abbreviated PSG. Polysomnography is performed overnight, usually for 8 hours,

and is a comprehensive recording of the biophysiological changes that occur during sleep.

The PSG monitors many body functions, including brain activity (EEG), as said before, eye

movements (EOG), muscle activity or skeletal muscle activation (EMG), and heart rhythm

(ECG), during sleep. Polysomnography is used to diagnose, or rule out, many types of sleep

disorders, including narcolepsy, idiopathic hypersomnia, periodic limb movement disorder

(PLMD), REM behavior disorder, parasomnias, and sleep apnea. Although it is not directly

useful in diagnosing circadian rhythm sleep disorders, it may be used to rule out other sleep

disorders. Returning to the EEG, its pattern in NREM sleep is synchronous, with charac-

teristic waveforms such as sleep spindles, K-complexes and slow-frequency, high-amplitude

waves (delta waves). By contrast, REM sleep is defined by low-voltage EEG activation, mus-

cle atonia and episodic bursts of REMs. The rules for visual sleep scoring are provided by

the recommendations of Rechtschaffen and Kales (R&K) published in 1968 and of the Amer-

ican Association of Sleep Medicine (AASM) published in 2007. According to these manuals,

it is possible to distinguish between wake, REM sleep, NREM sleep stages 1 to 4 and move-

ment time. NREM sleep stages 1 and 2 are regarded as light sleep and stages 3 and 4 are

Bio-Fingerprinting applied to polysomnographs

Page 18: Bio-Fingerprinting applied to polysomnographs

10 Introduction

regarded as deep sleep or slow-wave sleep due to the predominance of slow delta waves in

the EEG. Sleep scoring is performed for time segments of 20 or 30 seconds, which are re-

ferred to as epochs. Thus, 8 hours of sleep consist of 960 30-second epochs. The plot of a

sequence of sleep stages is called a hypnogram. Human sleep starts generally with a stage

1 (N1), a very light sleep usually lasting up few minutes. Slow rolling eye movements are a

feature of stage 1 and contractions of the muscles as well as hypnagogic jerks may occur.

Next follows stage 2 (N2), a deeper state of sleep than stage 1, characterized by the occur-

rence of sleep spindles and K-complexes and an intermediate muscle tone. Stage 2 usually

precedes deep sleep - stages 3 and 4 (SWS, N3). The main characteristic of deep sleep is

the presence of slow oscillations (1 Hz) and delta waves (1-4 Hz) in the EEG for at least 20%

of the epoch duration. The muscle tone is low. Rapid eye movement sleep occurs periodi-

cally throughout the night and is characterized by rapid eye movements, fast low-amplitude

EEG activity like the wake EEG, and a low muscle tone (atonia). The progression of the dif-

ferent stages is not random, but rather follows a cyclic alternation of NREM and REM sleep

with a cycle duration of approximately 90 minutes. Healthy sleep consists of approximately

3-5 sleep cycles. The classification of sleep stages is the first step in the quantitative anal-

ysis of polysomnographic recordings. Sleep stage scoring relies heavily on visual pattern

recognition by a human expert and is time consuming and subjective. Thus, there is a need

for automatic classification. To classify sleep stages, it is important to identify where certain

patterns occur, such as sleep spindles. A sleep spindle is an electroencephalography (EEG)

pattern that results from specific variations in membrane potentials in the thalamocortical

network of the brain. They are defined as a train of distinct waves with frequency 11–16 Hz

(most commonly 12-14 Hz) with duration ≥ 0.5 s, usually maximal in amplitude using cen-

tral deviations. Spindles are a hallmark of stage 2 sleep as they define the transition from

stage N1 (non-rapid eye movement, NREM1) to stage N2 (NREM2). Although the function

of sleep spindles is unclear, it is believed that they actively participate in the consolidation of

overnight declarative memory, the conscious, intentional recollection of factual information,

previous experiences, and concepts, through the reconsolidation process. The density of

spindles has been shown to increase after extensive learning of declarative memory tasks

and the degree of increase in stage 2 spindle activity correlates with memory performance.

Sleep spindle activity has furthermore been found to be associated with the integration of

new information into existing knowledge as well as directed remembering and forgetting

(fast sleep spindles). Moreover, sleep spindles closely modulate interactions between the

brain and its external environment; they essentially moderate responsiveness to sensory

stimuli during sleep. Recent research has revealed that spindles distort the transmission

of auditory information to the cortex. Spindles isolate the brain from external disturbances

during sleep. During NREM sleep, the brain waves produced by people with schizophrenia

lack the normal pattern of slow and fast spindles. Loss of sleep spindles are also a fea-

ture of familial fatal insomnia, a prion disease. Changes in spindle density are observed in

Bio-Fingerprinting applied to polysomnographs

Page 19: Bio-Fingerprinting applied to polysomnographs

11

disorders such as epilepsy and autism. Although visual inspection by experts is the gold

standard of sleep spindle detection, with the rapid increase in research on sleep spindles,

various automated methods of spindle detection have been proposed to reduce subjective

biases and increase reliability and objectivity. The major advantages of automated methods

are faster, more reproducible and systematic scoring. They extract features from EEG data

and apply specific thresholds to identify features corresponding to sleep spindles. A stan-

dardized band-pass filter or custom frequency range filter and amplitude-threshold approach

has been commonly used in research literature and reported approximately 90% sensitivity.

Time-frequency analysis method also has been applied in spindle detection with wavelet

transformation and matching pursuit. More recently, sophisticated automatic sleep spindle

detection methods using artificial neural networks have been developed and reported high

agreement with experts (ranging from 85% to 96%). Although there is ample evidence that

many automated methods have an acceptable agreement with experts, they have the limita-

tions that they are known for occasional problems with differentiating ambiguous oscillation

signals (e.g., alpha versus spindles). They can also be highly influenced by the algorithm

settings chosen by researchers (e.g., spindle duration, frequency, and amplitude character-

istics). Spindle density and characteristics such as mean oscillation frequency, amplitude

and duration appear to be trait-like, because they are stable over time (inter-night stability)

for the same subject but vary considerably between subjects. To make up for this limitation

and the time consumption of manually detection spindles by experts, adapting to the pa-

tient’s characteristics may be another potential technique. The aim of the present project is

to demonstrate that the performance of an existent sleep-spindle detector can improve by

modifying the algorithm so that it can be adapted to the characteristics of each patient.

Bio-Fingerprinting applied to polysomnographs

Page 20: Bio-Fingerprinting applied to polysomnographs

12 Introduction

Bio-Fingerprinting applied to polysomnographs

Page 21: Bio-Fingerprinting applied to polysomnographs

13

Chapter 3

Used technologies

In this project I have used the PyCharm development environment to program in Python,

the language with which the initial algorithm is written. To perform the analysis I have used

several libraries. The main ones are: nummpy and pandas for data manipulation, plotly and

matplotlib for displaying graphs, mne and scipy for the analysis part. Furthermore, I have

used Microsoft Excel to collect and examine data in table form and GeoGebra to observe

useful functions for data analysis.

Bio-Fingerprinting applied to polysomnographs

Page 22: Bio-Fingerprinting applied to polysomnographs

14 Used technologies

Bio-Fingerprinting applied to polysomnographs

Page 23: Bio-Fingerprinting applied to polysomnographs

15

Chapter 4

Algorithm

The start point of this project is a sleep-spindle detector that emulates human scoring. This

is an algorithm called YASA. To explain its operation, I am going to explain first the A7

algorithm because YASA is largely inspired by it. Then, I am going to explain the differences

between the two methods.

4.1 A7 algorithm

A7 algorithm runs on a single EEG channel. In my study I have used the C3-M2 channel

to perform the spindle detection since the amplitude of the spindle is maximal at the central

deviations. First, the detector applies a filter between 0.3 and 30 Hz according to standard

practice for clinical polysomnography. Another filter is used to distinct train of sigma waves

between 11 and 16 Hz, when spindle occurs. We will call this signals EEGbf and EEGσ,

respectively. To detect the spindles, the algorithm relies on four parameters:

• Absolute sigma power: used to identify train of sigma waves. Increase of power in the

sigma band means increase of energy of the signal EEGσ

• Relative sigma power: used to ensure the increase of power is specific to the sigma

band in the filtered signal EEGbf

• Sigma covariance: used to identify a high covariance between EEGσ and EEGbf. A

high sigma covariance will indicate that EEGσ and EEGbf vary together

• Sigma correlation: used to identify a high correlation between EEGσ and EEGbf. A

high sigma correlation will indicate that the changes in EEGσ result in the change in

EEGbf

These four parameters are computed on a 0.3 seconds window length each 0.1 seconds

for the whole EEG recording. It has been chosen this window to allow for detection of

spindles as short as 0.3 seconds length. Events that last less than 0.3 seconds and more

Bio-Fingerprinting applied to polysomnographs

Page 24: Bio-Fingerprinting applied to polysomnographs

16 Algorithm

than 2.5 seconds are discarded. The A7 algorithm detects a spindle when the four A7

sigma parameters exceed their respective thresholds. These four thresholds have been

established with a training dataset.

To analyze the performance of the detector, three parameters has been computed:

• Recall (sensitivity): the proportion of spindle detected

• Precision: the proportion of detected events considered as real spindle

• F1-score: the harmonic mean of the recall and precision

To calculate these parameters, the results computed by A7 have been compared to a

dataset containing the manual spindles detection performed by five human experts. The

by-event performance of the A7 spindle detector was 74% precision, 68% recall and an

F1-score of 0.70. This performance was equivalent to an individual human expert (average

F1-score=0.67).

4.2 YASA

The main differences between YASA and A7 are:

• YASA uses 3 different thresholds (relative power, root mean square and correlation)

instead of the four ones used by A7

• The windowed detection signals are resampled to the original time vector of the data

using cubic interpolation, thus resulting in a pointwise detection signal (= one value

at every sample). The time resolution of YASA is therefore higher than the A7 algo-

rithm. This allows for more precision to detect the beginning, end and durations of the

spindles (typically, A7 = 100 ms and YASA = 10 ms)

• The relative power in the sigma band is computed using a Short-Term Fourier Trans-

form

• The median frequency and absolute power of each spindle is computed using a Hilbert

transform.

• YASA computes some additional spindles properties, such as the symmetry index and

number of oscillations

• Potential sleep spindles are discarded if their duration is below 0.5 seconds and above

2 seconds. These values are respectively 0.3 and 2.5 seconds in the A7 algorithm

• YASA incorporates an automatic rejection of pseudo or fake events based on an Iso-

lation Forest algorithm.

Bio-Fingerprinting applied to polysomnographs

Page 25: Bio-Fingerprinting applied to polysomnographs

17

4.3 Detection of the spindles

To understand how the algorithm works, it is important to know the meaning of the three

parameters used to detect the spindles.

4.3.1 Threshold 1: Relative power in the sigma band

When a spindle occurs, an increase of energy is expected to be found in the sigma frequency

range (11-16 Hz). To calculate the power in the sigma band relative to the total power in

the broadband frequency (1-30 Hz) a Short-Time Fourier Transform (STFT) is used. It is

performed on consecutive epochs of 2 seconds and with an overlap of 200 ms. To ensure

that at least 20% of the signal’s total power is contained within the sigma band, a threshold

of 0.2 has been fixed, so that it is exceeded whenever a sample has a relative power in the

sigma frequency range ≥ 0.2.

4.3.2 Threshold 2: Moving correlation

This parameter is used to ensure that the changes in EEGσ result in the change in EEGbf. It

is calculated with a sliding window of 300 ms and a step of 100 ms. The Pearson correlation

coefficient between the EEGbf signal and the EEGσ signal is computed and the threshold

is exceeded every time that a sample has a correlation value r ≥ 0.65.

4.3.3 Threshold 3: Moving RMS

To detect increase of energy in the sigma band, a third threshold is defined by computing a

moving root mean square (RMS) of EEGσ, with a window size of 300 ms and a step of 100

ms. The threshold is exceeded every time that a sample has a RMS ≥ 1.5.

4.3.4 Decision function

After calculating each parameter, these values are interpolated using cubic interpolation to

obtain one value per each time point. To detect a spindle it is necessary that at least two

of the three threshold are exceeded. Furthermore, spindles that are too close to each other

(less than 500 ms) are merged together and the ones that are too short (less than 0.5 sec)

or too long (more than 2 sec) are removed.

4.3.5 Additional information

The three parameters are computed and interpolated by the algorithm so that they assume

one value for each sample of the track. The sample frequency used for the project is 125

sps, meaning that every second has 125 values of each parameter. Since spindles usually

Bio-Fingerprinting applied to polysomnographs

Page 26: Bio-Fingerprinting applied to polysomnographs

18 Algorithm

have a duration of 1 or 2 second, most spindle event contains 125 or 250 values of each

parameter.

Bio-Fingerprinting applied to polysomnographs

Page 27: Bio-Fingerprinting applied to polysomnographs

19

Chapter 5

Preliminary analysis

The aim of this project is to modify the YASA algorithm so that it could be personalized for

each patient. To understand if it is possible, it is important to know if the performance of the

algorithm change on different patients and if modifying the thresholds the results are not the

same. To do this, I have collected ten different electroencephalogram from ten patients and

I have tested the algorithm without modifications. For every EEG, I had the annotations of

the spindles detected by the REM Logic detector, so the matching that I have computed in

these preliminary analysis is not accurate. To count te matching of spindles detected with

the real ones, I have considered an error of more or less two seconds between the beginning

of the detected spindle and the beginning of the real one. With a multi-thread script I have

tried every combination of the three parameters that could have made sense. I could not try

combination with more than two significant figure because this would have taken months.

The performance has resulted different in each file, meaning that each patient is sensitive

to changing the threshold.

Once understood that different thresholds could be computed for the parameters for every

patient, the important step is to find out how these variables change in every subject and

the relation between them. To do this, before searching for mathematical relations, I wanted

to see how the values assumed by the three parameters during the occurrence of a spindle

resulted plotted in a graph. From now on, all the analysis I have performed have been made

on a single file, of which I know all the spindles that occurred on channel C4-M1, since

the spindle activity could be noticed mainly at central derivation. The number of spindles

annotated by the neurologist is 838. The results computed by the algorithm on this file are:

Table 5.1: Results initial YASA algorithmReal spindles Detected spindles Matching Not detected spindles

838 904 417 421

Bio-Fingerprinting applied to polysomnographs

Page 28: Bio-Fingerprinting applied to polysomnographs

20 Preliminary analysis

First of all, I have made three graphs, one for parameter, with the values of the data of the

EEG in the x axis, and the corresponding values of the parameter computed by the algorithm

on the y axis. What I noticed is that this way of proceeding have not make sense because

the values were scattered throughout the graph without a relation.

Figure 5.1: EEG data and relative power

Figure 5.2: EEG data and correlation

Bio-Fingerprinting applied to polysomnographs

Page 29: Bio-Fingerprinting applied to polysomnographs

21

Figure 5.3: EEG data and rms

What could have made more sense was to plot how the three parameters vary over time. As

can be seen in the following images, it seems that when a spindle occurs, the rms parameter

grows very fast, and also the relative power has a slight increase. Covariance, on the other

hand, does not seem to follow a specific trend.

Figure 5.4: Parameters in spindle with rms peak of 22.90

Bio-Fingerprinting applied to polysomnographs

Page 30: Bio-Fingerprinting applied to polysomnographs

22 Preliminary analysis

Figure 5.5: Parameters in spindle with rms peak of 12.17

Figure 5.6: Parameters in spindle with rms peak of 9.40

Figure 5.7: Parameters in spindle with rms peak of 11.57

Bio-Fingerprinting applied to polysomnographs

Page 31: Bio-Fingerprinting applied to polysomnographs

23

Figure 5.8: Parameters in spindle with rms peak of 9.50

Figure 5.9: Parameters in spindle with rms peak of 10.44

Looking at these graphs, it may be natural to think that to identify a spindle it is sufficient

to identify a sudden increase in the rms parameter. Unfortunately, as it can be seen in the

following images that show the trend of the parameters during the absence of spindles, this

parameter undergoes abrupt changes very often during the night.

Bio-Fingerprinting applied to polysomnographs

Page 32: Bio-Fingerprinting applied to polysomnographs

24 Preliminary analysis

Figure 5.10: Parameters in EEG with no spindle

Figure 5.11: Parameters in EEG with no spindle

Figure 5.12: Parameters in EEG with no spindle

Bio-Fingerprinting applied to polysomnographs

Page 33: Bio-Fingerprinting applied to polysomnographs

25

Figure 5.13: Parameters in EEG pre-spindle

Figure 5.14: Parameters in EEG pre-spindle

Figure 5.15: Parameters in EEG pre-spindle

Bio-Fingerprinting applied to polysomnographs

Page 34: Bio-Fingerprinting applied to polysomnographs

26 Preliminary analysis

Figure 5.16: Parameters in EEG post-spindle

Figure 5.17: Parameters in EEG post-spindle

Figure 5.18: Parameters in EEG post-spindle

Bio-Fingerprinting applied to polysomnographs

Page 35: Bio-Fingerprinting applied to polysomnographs

27

It is interesting to understand if the values of the rms peaks during the spindle events have a

relation with the values assumed by the other parameters and/or fall within a narrow range.

To do this, firstly I have found the rms peaks of the spindles and the corresponding relative

power and correlation values in term of time. Once I did this, with the help of Excel, I have

started these analysis.

Bio-Fingerprinting applied to polysomnographs

Page 36: Bio-Fingerprinting applied to polysomnographs

28 Preliminary analysis

Bio-Fingerprinting applied to polysomnographs

Page 37: Bio-Fingerprinting applied to polysomnographs

29

Chapter 6

Analysis taking the rms peak ofevery spindle

To understand in which ranges most of the points fall and therefore find reasonable thresh-

olds for this patient, I have plotted some bar-plots for every parameter. Instead, to discover

some relations between the parameters, I have plotted three scatter-plots which relate the

three variables two by two.

6.1 Bar-plots for rms peaks

The YASA algorithm detects a spindle where two of the three parameters exceed a pre-

established threshold for more than 0.5 seconds and less than 2 seconds. To try to improve

the performance of the detector making it adapted to the patient in question, it is important

to understand how the thresholds could be changed to make it better. With the following

bar-plots we can see which values the three parameters assume most in every rms peak.

Bio-Fingerprinting applied to polysomnographs

Page 38: Bio-Fingerprinting applied to polysomnographs

30 Analysis taking the rms peak of every spindle

Table 6.1: Rms peaks:

rms bar-plotClass Frequency

3 0

4 1

5 3

6 8

7 32

8 55

9 103

10 99

11 114

12 88

13 82

14 64

15 52

16 43

17 34

18 17

19 12

20 14

21 9

22 1

23 4

24 2

25 0

26 0

27 0

28 0

29 0

30 0

31 0

32 1

Other 0

Figure 6.1: Rms peaks: rms bar-plot

Most rms values fall in the range between 7 and 20.

Bio-Fingerprinting applied to polysomnographs

Page 39: Bio-Fingerprinting applied to polysomnographs

31

Table 6.2: Rms peaks:

correlation bar-plotClass Frequency

0.1 0

0.2 0

0.3 4

0.4 9

0.5 26

0.6 92

0.7 125

0.8 242

0.9 247

1 93

Other 0

Figure 6.2: Rms peaks: correlation bar-plot

Most correlation values fall in the range between 0.5 and 1.

Table 6.3: Rms peaks: rel-

ative power bar-plotClass Frequency

0 0

0.1 66

0.2 154

0.3 206

0.4 163

0.5 128

0.6 66

0.7 41

0.8 12

0.9 2

1 0

Other 0

Figure 6.3: Rms peaks: relative power bar-plot

Most relative power values fall in the range between 0.1 and 0.8.

From these observations, I modified the algorithm giving the parameters a lower and a

higher threshold. These are the results:

Bio-Fingerprinting applied to polysomnographs

Page 40: Bio-Fingerprinting applied to polysomnographs

32 Analysis taking the rms peak of every spindle

Table 6.4: Rms peaks: results with modified thresholdsRelative power Correlation Rms Detected Matching Not detected

0.2 -∞ 0.65 -∞ 6.5 - 20.5 3250 786 52

0.2 -∞ 0.65 -∞ 7 - 20 5712 760 78

0.2 -∞ 0.45 - 1.05 7 - 20 2528 752 86

0.2 -∞ 0.65 -∞ 6.5 - 16 3192 740 98

0.1 - 0.8 0.65 -∞ 7 - 20 1961 617 221

Looking at these results, it can be noted that the value of the detected spindles is very high,

meaning that there is a large number of false positive values. Trying to reduce the range,

the result does not improve, because the number of false positive decreases, but also the

number of spindles matched decreases a lot and so the performance does not get better.

The next step is try to define functions that describe the trend of the parameters.

6.2 Scatter-plots for rms peaks

Linking the parameters two by two and searching for a relation between them could be a

good way to find ranges that change as the parameter values change. To reach this goal, it

is useful to look at the following scatter-plots.

Figure 6.4: Scatter plot relative power-correlation of spindle peaks

Bio-Fingerprinting applied to polysomnographs

Page 41: Bio-Fingerprinting applied to polysomnographs

33

Figure 6.5: Scatter plot relative power-rms of spindle peaks

Figure 6.6: Scatter plot correlation-rms of spindle peaks

From these images, it seems to me to see a logarithmic relationship between relative power

and correlation. The points on the other two graphs are instead more scattered and I cannot

see a clear relation between the parameters.

Bio-Fingerprinting applied to polysomnographs

Page 42: Bio-Fingerprinting applied to polysomnographs

34 Analysis taking the rms peak of every spindle

6.3 Range computed with logarithmic functions

Starting from the hypothesis that exist a logarithmic link between relative power and corre-

lation, I have tried to compute a range delimited by two logarithmic functions on the basis of

the scatter-plot in Figure 5.4 and the function calculated by Excel:

y = 0.1614ln(x) + 0.9515

where x is the relative power and y the correlation.

With the help of GeoGebra, I have looked for the best functions that contained all or almost

all the points present in the graph, maintaining a logarithmic trend. I have chosen these two

functions:

y = 0.11ln(x) + 1.1

for the upper limit and

y = 0.23ln(x) + 0.8

for the lower limit.

Figure 6.7: Rms peaks: logarithmic relation and range between relative power and correla-

tion

Now we can try to let the algorithm find a spindle only when these limits are respected.

Bio-Fingerprinting applied to polysomnographs

Page 43: Bio-Fingerprinting applied to polysomnographs

35

Table 6.5: Rms peaks: results with logarithmic functionRelative power Correlation Rms Detected Matching Not detected

0.2 -∞ 0.65 -∞ 6.5 - 20.5 1584 547 291

0.2 -∞ 0.65 -∞ 7 - 20 1423 528 310

0.2 -∞ 0.45 - 1.05 7 - 20 1691 579 259

0.2 -∞ 0.65 -∞ 6.5 - 16 1515 488 350

0.1 - 0.8 0.65 -∞ 7 - 20 1934 614 224

Looking at these results, it seems that this approach does not work. Why? We will talk

about this later. Given the poor results of this approach, maybe it would be better to follow

the approach of the initial YASA algorithm and try to define some thresholds personalized

for every patient. In this way, it is also possible to compare the achieved results with the new

thresholds with the initial ones without changing the algorithm in a drastic way. Remember

that the aim of this project is to demonstrate that a spindle detector can be personalized to

obtain better results.

Bio-Fingerprinting applied to polysomnographs

Page 44: Bio-Fingerprinting applied to polysomnographs

36 Analysis taking the rms peak of every spindle

Bio-Fingerprinting applied to polysomnographs

Page 45: Bio-Fingerprinting applied to polysomnographs

37

Chapter 7

Analysis to compute personalizedthresholds

The analysis that we are going to discuss in this chapter are the same as those we have

talked in the previous one because the principle is always try to define a range for the

parameters values to improve the detection of the spindles and also look for some relations

between the parameters. What changes compared to what has been done so far, is that we

do not consider anymore the rms peaks. Looking at the graphs representing the trend of

the parameters, we can notice that during a spindle event, before the rms reaches its peak,

there usually seems to be an inflection like this one circled in red:

Figure 7.1: Inflection in rms

We could try to use the central point of this inflection as a threshold that must be exceeded

to detect the spindle. But not all the spindles have this inflection. For the ones without it,

we could try to find the minimum value before the peak, so at the beginning of the spindle,

and try to find the point that is far from it, forward in time, a quarter of the distance between

the minimum and the peak, so as to simulate the presence of an inflection. As before, I

Bio-Fingerprinting applied to polysomnographs

Page 46: Bio-Fingerprinting applied to polysomnographs

38 Analysis to compute personalized thresholds

have taken this value for rms and the correspondenig values in time for relative power and

correlation, so I have plotted some bar-plots and some scatter-plots to watch che trend of

the parameters.

7.1 Bar-plots for rms inflections

To define a reasonable threshold, it is important to know which values the three parameters

assume most in every rms inflection.

Table 7.1: Rms inflections:

rms bar-plotClass Frequency

0 0

1 25

2 146

3 147

4 129

5 95

6 53

7 59

8 47

9 42

10 33

11 17

12 20

13 15

14 1

15 2

16 3

17 1

18 1

19 1

20 0

Other 0

Figure 7.2: Rms inflections: rms bar-plot

Most rms values fall in the range between 1 and 13.

Bio-Fingerprinting applied to polysomnographs

Page 47: Bio-Fingerprinting applied to polysomnographs

39

Table 7.2: Rms inflections:

correlation bar-plotClass Frequency

-0.7 1

-0.6 0

-0.5 0

-0.4 1

-0.3 0

-0.2 1

-0.1 15

0 31

0.1 73

0.2 118

0.3 122

0.4 112

0.5 95

0.6 67

0.7 79

0.8 69

0.9 38

1 16

Other 0

Figure 7.3: Rms inflections: correlation bar-plot

Most correlation values fall in the range between 0 and 0.9.

Bio-Fingerprinting applied to polysomnographs

Page 48: Bio-Fingerprinting applied to polysomnographs

40 Analysis to compute personalized thresholds

Table 7.3: Rms inflections:

relative power bar-plotClass Frequency

0 0

0.1 285

0.2 208

0.3 153

0.4 85

0.5 60

0.6 26

0.7 14

0.8 6

0.9 2

Other 1

Figure 7.4: Rms inflections: relative power bar-plot

Most relative power values fall in the range between 0.1 and 0.6.

Now we can try the algorithm with some ranges based on the results obtained with the

graphs. At first, it is better to define wide ranges that do not exclude too many spindles.

These are the results.

Table 7.4: Rms peaks: results with modified thresholdsRelative power Correlation Rms Detected Matching Not detected

0.1 -∞ 0.05 -∞ 1.5 -∞ 2171 714 124

0.1 - 0.5 0.05 - 0.9 0.5 - 13 2103 657 181

The results are not very encouraging, we could try to find some mathematical relations

between the parameters.

7.2 Scatter-plots for rms inflections

To look for some functional relations between the parameters, once again it is helpful to use

scatter plots.

Bio-Fingerprinting applied to polysomnographs

Page 49: Bio-Fingerprinting applied to polysomnographs

41

Figure 7.5: Scatter plot relative power-correlation of spindle inflections

Figure 7.6: Scatter plot relative power-rms of spindle inflections

Bio-Fingerprinting applied to polysomnographs

Page 50: Bio-Fingerprinting applied to polysomnographs

42 Analysis to compute personalized thresholds

Figure 7.7: Scatter plot correlation-rms of spindle inflections

From these images, it seems to me to see an exponential relationship between correlation

and rms. In the other two graphs is not easy to see a clear relation because the points are

more scattered but we can try to imagine a linear relation with a lot of outlier. Now we can

use these functions to give some functional ranges at the algorithm.

7.3 Range computed with mathematical functions

The first relation that I have taken into consideration is the exponential one between corre-

lation and rms. The equation calculated by Excel is:

y = 1.8227e1.8834x

where x is the correlation and y the rms. With the help of GeoGebra, I have defined one

exponential upper limit

y = 4e1.9x

and one exponential lower limit

y = e1.5x − 1

Bio-Fingerprinting applied to polysomnographs

Page 51: Bio-Fingerprinting applied to polysomnographs

43

Figure 7.8: Rms inflections: exponential relation and range between correlation and rms

Now we can test the algorithm adding these limits (first line of the table) and compare the

results with the ones without the limits (second line of the table).

Table 7.5: Rms inflections: results with exponential relation between correlation and rmsRelative power Correlation Rms Detected Matching Not detected

0.1 -∞ 0.05 -∞ 1.5 -∞ 2139 706 132

0.1 -∞ 0.05 -∞ 1.5 -∞ 2171 714 124

As the logarithmic relation between relative power and correlation computed for rms peaks,

also this approach seems to not work. We could try to add the two linear relation seen with

the scatter-plots for the other parameters. Regarding the relation between relative power

and correlation, the starting point is the function

y = 1.1219x+ 0.1564

where x is the relative power and y the correlation. I have taken as upper limit the function

y = 1.12x+ 0.7

and this as lower limit

y = 1.12x− 0.5

Bio-Fingerprinting applied to polysomnographs

Page 52: Bio-Fingerprinting applied to polysomnographs

44 Analysis to compute personalized thresholds

Figure 7.9: Rms inflections: linear relation and range between relative power and correlation

Now we can test the algorithm adding these limits (first line of the table) and compare the

results with the ones with the exponential limits correlation and rms (second line of the table).

Table 7.6: Rms inflections: results with linear relation between relative power and correlationRelative power Correlation Rms Detected Matching Not detected

0.1 -∞ 0.05 -∞ 1.5 -∞ 2103 686 152

0.1 -∞ 0.05 -∞ 1.5 -∞ 2139 706 132

Also this restriction do not provide satisfying results. We coul add the last range, based on

a linear relation between relative power and rms. The linear function computed by Excel is

y = 11.536x+ 2.3996

where x is the relative power and y the correlation. I have taken as upper limit the function

y = 1.12x+ 0.7

Bio-Fingerprinting applied to polysomnographs

Page 53: Bio-Fingerprinting applied to polysomnographs

45

and this as lower limit

y = 1.12x− 0.5

Figure 7.10: Rms inflections: linear relation and range between relative power and rms

Now we can test the algorithm adding these limits (first line of the table) and compare the

results with the ones with the linear limits for relative power and correlation (second line of

the table) and with the exponential limits for correlation and rms (third line of the table).

Table 7.7: Rms inflections: results with linear relation between relative power and correlationRelative power Correlation Rms Detected Matching Not detected

0.1 -∞ 0.05 -∞ 1.5 -∞ 2052 644 194

0.1 -∞ 0.05 -∞ 1.5 -∞ 2103 686 152

0.1 -∞ 0.05 -∞ 1.5 -∞ 2139 706 132

We can see that using mathematical functions as range do not improve the performance.

Looking at the last table, I notice that the number of detected spindles does not decrease

a lot but decreases the number of matching, as before with the logarithmic function for rms

peaks.

To understand why this approach does not work, the first thing I thought is to see if these

relations are maintained throughout the entire night and not just during spindle events.

Bio-Fingerprinting applied to polysomnographs

Page 54: Bio-Fingerprinting applied to polysomnographs

46 Analysis to compute personalized thresholds

Bio-Fingerprinting applied to polysomnographs

Page 55: Bio-Fingerprinting applied to polysomnographs

47

Chapter 8

Analysis on some rms inflectionsduring the night

To prove the hypothesis that the relations between the parameters are maintained during

the whole night, I have taken 838 points (to keep the same number of samples) of the track

where the rms has an inflection and I know there is no spindle. After this, I plotted some

scatter-plots to compare them with the ones plotted in the presence of a spindle.

Figure 8.1: Scatter plot relative power-correlation of spindle inflections

Bio-Fingerprinting applied to polysomnographs

Page 56: Bio-Fingerprinting applied to polysomnographs

48 Analysis on some rms inflections during the night

Figure 8.2: Scatter plot relative power-correlation of no-spindle inflections

Figure 8.3: Scatter plot relative power-rms of spindle inflections

Bio-Fingerprinting applied to polysomnographs

Page 57: Bio-Fingerprinting applied to polysomnographs

49

Figure 8.4: Scatter plot relative power-rms of no-spindle inflections

Figure 8.5: Scatter plot correlation-rms of spindle inflections

Bio-Fingerprinting applied to polysomnographs

Page 58: Bio-Fingerprinting applied to polysomnographs

50 Analysis on some rms inflections during the night

Figure 8.6: Scatter plot correlation-rms of no-spindle inflections

Looking at these graphs, we can see that the relations between the parameters computed

where there is a spindle event and the ones in the absence of it are very similar. This can be

the reason why the mathematical ranges calculated before does not produce good results.

Maybe we should focus on an automatic method to customize the thresholds, giving to the

algorithm one value for each parameter that must be exceeded to consider an event as

a spindle and maybe also an upper limit can be useful. We could try to compute these

thresholds on the base of the first ten or twenty spindles, marked by the neurologist, and

use them to detect the other ones.

Bio-Fingerprinting applied to polysomnographs

Page 59: Bio-Fingerprinting applied to polysomnographs

51

Chapter 9

Confidence interval on the mean tocompute the thresholds

The confidence interval on the mean is a statistical term used to describe the range of val-

ues in which the true mean is expected to fall, based on your data and confidence level.

The most commonly used confidence level is 95 percent, meaning that there is a 95 per-

cent probability that the true mean lies within the confidence interval you’ve calculated. To

calculate the confidence interval, you need to know the mean of your data set, the standard

deviation, the sample size and your chosen confidence level.

After calculating the confidence interval, I executed the algorithm with the lower limits used

as thresholds for every parameter. The performance seems to improve. To compute this

improvement mathematically, I have leaned on the F1-score.

9.1 F1-score as a measure of performance

In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a

measure of a test’s accuracy. It considers both the precision p and the recall r of the test

to compute the score: p is the number of correct positive results divided by the number of

all positive results returned by the classifier, and r is the number of correct positive results

divided by the number of all relevant samples (all samples that should have been identified

as positive). The F1 score is the harmonic mean of the precision and recall, where an F1

score reaches its best value at 1 (perfect precision and recall) and worst at 0.

F1 = 2Precision ∗RecallPrecision+Recall

Bio-Fingerprinting applied to polysomnographs

Page 60: Bio-Fingerprinting applied to polysomnographs

52 Confidence interval on the mean to compute the thresholds

Precision =TruePositive

TruePositive+ FalsePositive

Recall =TruePositive

TruePositive+ FalseNegative

In this project, every event is cataloged as follows:

- True Positive (TP): spindle events detected as spindle events

- True Negative (TN): no-spindle events detected as no-spindle events

- False Positive (FP): no-spindle events detected as spindle events

- False Negative (FN): spindle events detected as no-spindle events

Every event, however, is not just a point in the EEG track. It is not so easy to catalog an

event. For this purpose, it is useful to divide the track into some seconds windows. In this

way, we can classify events as follows:

- True Positive (TP): windows containing spindle events detected as windows containing

spindle events

- True Negative (TN): windows containing no-spindle events detected as windows con-

taining no-spindle events

- False Positive (FP): windows containing no-spindle events detected as windows con-

taining spindle events

- False Negative (FN): windows containing spindle events detected as windows con-

taining no-spindle events

As window length I choose five seconds, that seems to me a good compromise between

the duration of spindles, usually one or two seconds and the high number of True Negative

presents in the track. In fact, considering a sleep of almost 8 hours, if we not consider the

838 real spindles and the ones improperly detected, the rest of the track is formed only by

TN. Naturally, if I change the window in which I compute the matching, the results change.

This is not a problem because the F1-score is useful for the comparison between the initial

YASA algorithm and the one modified, so it is enough to recalculate the matching for the

initial algorithm and perform the F1-score.

These are the results of the initial YASA algorithm with the new matching

Bio-Fingerprinting applied to polysomnographs

Page 61: Bio-Fingerprinting applied to polysomnographs

53

Table 9.1: Initial YASA algorithm results with F1-scoreReal spindles Detected spindles Matching Not detected spindles F1-score

838 904 422 416 0.48

Now we can compare these results with the one obtained computing the thresholds with the

confidence interval, using different confidence levels.

Table 9.2: Results using confidence interval

Confidence level Detected spindles Matching Not detected spindles F1-score

80 % 771 466 372 0.58

90 % 776 472 366 0.58

95 % 801 475 363 0.58

We can notice that the F1-score has increased of 0.1. This means that calculate the thresh-

olds with the confidence interval on the mean is a good approach and help us to increase

the performance of the detector.

9.2 Personalize the algorithm knowing only some spindles for

patient

Now that we have found a good way to set the thresholds, for the aim of the project is

important to try to detect the spindles computing the thresholds knowing only 10 or 20 initial

spindles annotated by the neurologist. The confidence interval is calculated only on these

spindles. These are the results.

Table 9.3: Results using confidence interval with 10/20 spindles in inputInput Confidence Detected Matching Not detected F1-score

spindles level spindles spindles

10 80 % 2321 761 77 0.48

10 70 % 2080 738 100 0.51

10 50 % 1738 701 137 0.54

20 95 % 1942 724 114 0.52

20 80 % 1549 659 179 0.55

20 50 % 1268 608 230 0.58

As we could expect, the number of detected spindles is higher than before, because the

Bio-Fingerprinting applied to polysomnographs

Page 62: Bio-Fingerprinting applied to polysomnographs

54 Confidence interval on the mean to compute the thresholds

confidence interval is computed on less data and so is less accurate. However, decreasing

the confidence level, the F1-score improve because, though the matching is no more so

high, the algorithm detects less false positive.

Bio-Fingerprinting applied to polysomnographs

Page 63: Bio-Fingerprinting applied to polysomnographs

55

Chapter 10

Preliminary analysis for adaptivemethod to compute thresholds

The aim of the project has been reached with the use of confidence interval on the mean.

However, to improve the performance, further analysis can be carried out. A possible ap-

proach could be an adaptive method to set the thresholds. The idea is to fix a threshold and

compute the other two at each point of the track on the basis of the value assumed at that

moment by the parameter whose threshold has been set, if the latter has been exceeded.

To understand if this method could be efficient, it is important to perform some extra analysis

on the parameters.

10.1 Exploration of the relationship between parameters

To compute two thresholds on the basis of the value assumed by the other parameter, there

should be a relation between the three variables. To prove the relation, I computed a poly-

nomial regression with relative power and correlation as independent variables and rms as

target variable. In statistics, polynomial regression is a form of regression analysis in which

the relationship between the independent variable x and the dependent variable y is mod-

elled as an nth degree polynomial in x. I have chosen to compute a polynomial regression

with degree 2 because calculating the F1-score, it does not improve with increasing degree,

while degree 1 it is not very accurate. I have performed the regression with and without

removing outliers. To remove them, I have used the cook’s distance method. It is used in

regression analysis to identify the effects of outliers. It is believed that influential outliers

negatively affect the model. The cook’s distance tries to capture this information concerning

the predictor variables. The distance is a measure combining leverage and residual of each

value; the higher the leverage and residual, the higher the score for cook’s distance. An

outlier is detected when

Cook′sDistance >4

n− p

Bio-Fingerprinting applied to polysomnographs

Page 64: Bio-Fingerprinting applied to polysomnographs

56 Preliminary analysis for adaptive method to compute thresholds

where p is the number of variables and n is the dataset size.

To determine how well the model fits the data, I have used R-squared (R2), also known as

coefficient of determination. It is the proportion of the variance in the dependent variable

that is predictable from the independent variables. It is a number between 0 and 1, where

0 indicates that the model explains none of the variability of the response data around its

mean, while 1 indicates that the model explains all the variability of the response data around

its mean. So the higher the R-squared value, the better the model fits the data.

10.2 Polynomial regression on spindle inflections

First of all, I have computed the regression considering the inflections and minimum of the

spindles. These are the results:

Table 10.1: Rms inflections and minimum: R2 in polynomial regressionRemoval outliers Number of spindles R2

No 838 0.56

Yes 769 0.57

Then, I have separated the inflections from the lows and repeated the regression

Table 10.2: Rms inflections and minimum separated: R2 in polynomial regressionType Removal outliers Number of spindles R2

Inflections No 528 0.56

Inflections Yes 496 0.61

Minimum No 310 0.56

Minimum Yes 292 0.58

Looking at these tables, we can notice that removing the outliers the R2 value does not

change a lot and that inflections and minimum can be considered together in our analysis.

However, the R-squared coefficient is not so high. We could try to perform the regression

considering all the points of the spindle and not only its inflection or minimum

Table 10.3: Total spindle durations: R2 in polynomial regressionRemoval outliers Number of total points R2

No 174125 0.69

Yes 160726 0.72

Bio-Fingerprinting applied to polysomnographs

Page 65: Bio-Fingerprinting applied to polysomnographs

57

From these results, we can understand that the polynomial regression works better on all

the spindle duration and not just taking a point of the event. The R-squared coefficient is

good and so we can conclude that exists a relation between the three parameters when

a spindle occurs. We can try to follow the approach of adapting the thresholds during the

night.

Bio-Fingerprinting applied to polysomnographs

Page 66: Bio-Fingerprinting applied to polysomnographs

58 Preliminary analysis for adaptive method to compute thresholds

Bio-Fingerprinting applied to polysomnographs

Page 67: Bio-Fingerprinting applied to polysomnographs

59

Chapter 11

Adaptive method to computethresholds

To decide on which parameter fix the threshold, I have computed some regressions with

degree 2. I have tried to relate the parameters two by two to understand which regression

fits the data the best way.

Table 11.1: Relations between the parameters: R2 in regressionIndependent Dependent R2 with R2 without

variable variable outliers outliers

relative power rms 0.28 0.32

relative power correlation 0.39 0.47

rms relative power 0.30 0.27

rms correlation 0.62 0.62

correlation relative power 0.48 0.46

correlation rms 0.52 0.54

Looking at this table, I think that the parameter that could be fixed is correlation, because

using it as independent variable in the regression the R-squared coefficients are both quite

good. To set the threshold, I have computed the confidence interval with different confidence

level, to see which is the best. I have tried to calculate it taking in input only the point of

inflection or minimum of the spindle or taking all the spindle. I have performed it on the first

10 or 20 spindles. Then, for every point of the track, if this threshold is exceeded, I have

calculated the other two thresholds with the regression coefficients performed before. So I

have checked if these new thresholds are also exceeded. These are the results:

Bio-Fingerprinting applied to polysomnographs

Page 68: Bio-Fingerprinting applied to polysomnographs

60 Adaptive method to compute thresholds

Table 11.2: Adaptive method: correlation fixed

Input Inflection / Confidence Detected Matching Not detected F1-score

spindles all spindle level spindles spindles

10 all spindle 50 % 334 234 604 0.40

10 all spindle 90 % 343 239 599 0.40

10 inflection 50 % 682 351 487 0.46

10 inflection 90 % 850 375 463 0.44

20 inflection 50 % 833 433 405 0.52

20 inflection 90 % 1024 471 367 0.51

20 all spindle 90 % 579 344 494 0.49

Some of these F1-score are better than the one of the initial YASA algorithm but not than

the ones of the algorithm that uses threshold computed only with the confidence interval.

Maybe correlation is not the best threshold to fix, so we can try to fix one of the other two

parameters. Moreover, an upper limit for the parameters could be introduced. We can

compute them using the confidence interval on the values of the parameters corresponding

to the rms peaks.

Table 11.3: Adaptive method: rms fixed

Input Inflection / Confidence Detected Matching Not detected F1-score

spindles all spindle level spindles spindles

10 all spindle 90 % 673 348 490 0.46

10 all spindle 99 % 701 358 480 0.47

10 all spindle 50 % 640 333 505 0.45

10 inflection 50 % 1714 481 357 0.38

10 inflection 90 % 1808 475 363 0.36

20 inflection 90 % 1568 454 384 0.38

20 inflection 50 % 1499 449 389 0.38

Bio-Fingerprinting applied to polysomnographs

Page 69: Bio-Fingerprinting applied to polysomnographs

61

Table 11.4: Adaptive method: rms fixed with upper limit

Input Inflection / Confidence Detected Matching Not detected F1-score

spindles all spindle level spindles spindles

10 inflection 90 % 994 135 703 0.15

20 inflection 90 % 1542 438 400 0.37

20 all spindle 90 % 798 362 476 0.44

20 all spindle 95 % 806 362 476 0.44

20 all spindle 50 % 777 360 478 0.45

Table 11.5: Adaptive method: relative power fixed

Input Inflection / Confidence Detected Matching Not detected F1-score

spindles all spindle level spindles spindles

10 all spindle 50 % 1555 561 277 0.47

10 all spindle 90 % 1587 563 275 0.46

10 inflection 90 % 3283 729 109 0.35

10 inflection 50 % 3079 738 100 0.38

20 inflection 50 % 2902 691 147 0.37

20 inflection 90 % 3272 700 138 0.34

20 all spindle 90 % 1652 563 275 0.45

20 all spindle 50 % 1617 559 279 0.46

Table 11.6: Adaptive method: relative power fixed with upper limit

Input Inflection / Confidence Detected Matching Not detected F1-score

spindles all spindle level spindles spindles

10 inflection 90 % 3314 693 145 0.33

10 inflection 50 % 3016 625 213 0.32

20 inflection 50 % 2909 585 253 0.31

Trying to fix the thresholds of the other parameters, the situation does not get better. Some-

times the F1-score is better using just the inflections and minimum to calculate the con-

fidence interval, sometimes all the spindle is necessary. When the number of detected

spindles is very high, I have introduced the upper limit, but it seems to exclude too many

Bio-Fingerprinting applied to polysomnographs

Page 70: Bio-Fingerprinting applied to polysomnographs

62 Adaptive method to compute thresholds

true positive events. We could conclude that for this patient this adaptive method does not

work so well.

Bio-Fingerprinting applied to polysomnographs

Page 71: Bio-Fingerprinting applied to polysomnographs

63

Chapter 12

Conclusions

The aim of the project has been reached. I have demonstrated that a detection spindle

algorithm can be made customizable for each patient. The best method I have tested for

reaching this purpose is to calculate the three parameter thresholds on the first 10 spin-

dles (or 20 if you want to be more precise), annotated from the doctor, with the use of the

confidence interval on the mean. For the patient on which I have tested this method, the

F1-score has increased from 0.48, result of the initial YASA algorithm, to 0.58, result using

20 spindles in input and a confidence level of 50%. Unfortunately, I had only one file with

the neurologist’s notes, so I could not try the operation of this method on other patients.

However, I have performed some analysis that can be replicated on other files and I have

demonstrated that there is a relation between the trend of the three parameters used to

detect the spindles.

Bio-Fingerprinting applied to polysomnographs

Page 72: Bio-Fingerprinting applied to polysomnographs

64 Conclusions

Bio-Fingerprinting applied to polysomnographs

Page 73: Bio-Fingerprinting applied to polysomnographs

65

Bibliography

[1] Andrew L. Chesson Jr. Conrad Iber, Sonia Ancoli-Israel. The aasm manual for the scor-

ing of sleep and associated events. 2007.

[2] Silvia Parapatics Peter Anderer, Georg Gruber. An e-health solution for automatic sleep

classification according to rechtschaffen and kales: Validation study of the somnolyzer

24 x 7 utilizing the siesta database. 2005.

[3] Julien Beaudry Karine Lacourse, Jacques Delfrate. A sleep spindle detection algorithm

that emulates human expert spindle scoring. 2018.

[4] raphaelvallat. https://github.com/raphaelvallat/yasa.

[5] Wikipedia.

[6] https://www.ncbi.nlm.nih.gov/pmc/articles/pmc5426701/.

[7] https://sciencing.com/calculate-confidence-interval-mean-5933144.html.

[8] https://datasciencebeginners.com/2018/11/18/10-how-to-detect-outliers/.

Bio-Fingerprinting applied to polysomnographs