Post on 19-Aug-2020
ESTIMATION OF PERIODICITY IN NON-UNIFORMLY SAMPLED ASTRONOMICALDATA - AN APPROACH USING SPATIO-TEMPORAL KERNEL BASED
CORRENTROPY
By
BIBHU PRASAD MISHRA
A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE
UNIVERSITY OF FLORIDA
2011
c© 2011 Bibhu Prasad Mishra
2
Dedicated to my parents and my younger sister
3
ACKNOWLEDGMENTS
First, my sincere gratitude goes to my advisor Dr. Jose C. Prıncipe for his wonderful
guidance and remarkable patience throughout my research, my committee members Dr.
John Harris and Dr. John M. Shea for their guidance and help throughout my graduate
studies. I would like to thank our collaborators Dr. Pavlos Protopapas and Dr. Pablo
A. Estevez for their valuable insight. I would also like to thank Alex and Abhishek for
their help during the initial part of the project, Rakesh for his suggestions, Austin for his
discussions on various topics and all members of CNEL for their knowledge on variety of
topics. Last but not the least I would like to thank my parents, my sister and friends for
their constant support and encouragement throughout my life.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER
1 AN INTRODUCTION TO PERIODICITY ESTIMATION IN ASTRONOMICALDATA ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1 Overview of the Astronomical Data . . . . . . . . . . . . . . . . . . . . . . 101.2 Introduction to Estimation Techniques . . . . . . . . . . . . . . . . . . . . 11
2 PERIODICITY ESTIMATION TECHNIQUES: A REVIEW . . . . . . . . . . . . 14
2.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.1 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 Lomb Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.3 Dirichlet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 PERIODICITY ESTIMATION USING KERNEL BASED METHODS . . . . . . . 21
3.1 Correntropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Spatio-temporal Kernel based Proposed Method . . . . . . . . . . . . . . 233.3 Kernel Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 PERIODICITY ESTIMATION USING SPATIO-TEMPORAL KERNEL BASEDCORRENTROPY ON FOLDED TIME SERIES DATA . . . . . . . . . . . . . . . 33
4.1 Period Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Kernel Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
APPENDIX: VARIABLE STEP SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5
LIST OF TABLES
Table page
2-1 Comparative performance using interpolation based techniques, Lomb periodogramand Dirichlet transform along with results published by Harvard University,Time Series Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3-1 Comparative performance using proposed 2D kernel based technique andcorrentropy on interpolated light curve due to [4] along with results publishedby Harvard University, Time Series Center. Correctly identified values are markedin bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4-1 Comparative performance using proposed correntropy based technique alongwith results published by Harvard University, Time Series Center. Correctlyidentified values are marked in bold. . . . . . . . . . . . . . . . . . . . . . . . . 40
5-1 Performance evaluation of the existing techniques and the roposed techniques.The results published by Harvard University, Time Series Center has beenused as the golden standard for the evaluation. . . . . . . . . . . . . . . . . . . 45
6
LIST OF FIGURES
Figure page
2-1 The sampled values at non-uniformly spaced time intervals are in blue andinterpolated and uniformly re-sampled values in red for different values of p. . . 15
2-2 Frame selection from the light curve 1.3804.164. Note that the y-axis representsthe brightness magnitude of the star system. However the brighter the objectappears, the lower the value of its magnitude as it is customary in astronomyto plot the magnitude scale reversed. . . . . . . . . . . . . . . . . . . . . . . . . 16
3-1 Contour and surface plot of CIM(X,Y) with Y=0 in 2D space with a Gaussiankernel and a kernel size equal to 1. . . . . . . . . . . . . . . . . . . . . . . . . 23
3-2 Figure illustrating the reason why simple correntropy can not be directly usedin case of non-uniformly sampled data. . . . . . . . . . . . . . . . . . . . . . . 25
3-3 Plot of 2D kernel based measure with varying standard deviation values fortime kernel. In this case light curve 1.3810.19 has an time period of 88.9406days and light curve 1.3449.27 has a time period of 4.0349 days. . . . . . . . 28
3-4 Plot of 2D kernel based measure with varying standard deviation values formagnitude kernel. In this case light curve 1.3810.19 is the data set used andhas a true time period equal to 88.9406 days. . . . . . . . . . . . . . . . . . . 29
4-1 Reconstruction of single period of the signal by breaking the original signalinto frames of length equal to the true time period of the signal. . . . . . . . . . 34
4-2 Folding performed on a non-uniformly sampled signal with true period equalto 1 unit. Folding has been performed with trail period equal to 1 unit and 1.3units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4-3 Plot of correntropy of transformed space with varying standard deviation valuesfor time kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4-4 Plot of correntropy of transformed space with varying standard deviation valuesfor magnitude kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5-1 Magnitude plot of light channel 1.3810.19. Note that the y-axis representsthe magnitude of the star. Magnitude measures the brightness of a celestialobject, however the brighter the object appears, the lower the value of its magnitude.It is customary in astronomy to plot the magnitude scale reversed. . . . . . . . 43
5-2 Plot of spatio-temporal kernel based correntropy for light curve 1.3448.153. . . 44
5-3 Plot of spatio-temporal kernel based correntropy for light curve 1.3810.19. . . 44
7
Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science
ESTIMATION OF PERIODICITY IN NON-UNIFORMLY SAMPLED ASTRONOMICALDATA - AN APPROACH USING SPATIO-TEMPORAL KERNEL BASED
CORRENTROPY
By
Bibhu Prasad Mishra
May 2011
Chair: Jose C. PrıncipeMajor: Electrical and Computer Engineering
Period estimation in non-uniformly sampled time series data is frequently a goal
in astronomical data analysis. There are various problems faced during estimation
of period. Firstly, data is sampled non-uniformly which makes it difficult to use simple
techniques such as Fourier transform for performing spectral analysis. Secondly, there
are large gaps in data which makes it difficult to interpolate the signal for re-sampling.
Thirdly, in data sets with smaller time periods, the non-uniformity in sampling and
noise in data pose even greater problems because of the lesser number of samples
per period. Lastly, one of the biggest problem is that the time period of these light
curves can not be easily identified by periodogram techniques because of the inherent
modulations in the light curve within a single instance of time period which must be
accounted for while estimating the period. Generally periodogram techniques give
a peak at fundamental frequency which may not be the frequency corresponding to
the true period but rather correspond to sub-multiple of the true period. In the present
work we first discuss few of the existing methods such as Fourier transform, Lomb
periodogram, Dirichlet transform for period estimation, then shifting our focus to kernel
based methods. A new spatio-temporal kernel based cost function has been proposed
which works directly on the non-uniformly sampled data. Furthermore, a spatio-temporal
kernel for correntropy on transformed space has been proposed to estimate the time
8
period of the data with enhanced accuracy. Finally, comparison of proposed methods
has been done to the existing techniques to highlight the improvement provided by the
kernel based methods.
9
CHAPTER 1AN INTRODUCTION TO PERIODICITY ESTIMATION IN ASTRONOMICAL DATA
ANALYSIS
Astronomical observations using visual wavelengths are called light curves (i.e.
brightness magnitude over time) and are used to quantify the motion of stars. Of
particular interest is the discrimination between periodic and non-periodic relative
movements and the quantification of the period of light curves obtained from a telltale
of objects such as eclipsing binaries, RRLs (pulsating variable stars named after RR
Lyrae), cepheids (intrinsically variable stars with exceptionally regular periods of light
pulsation), etc [11].
1.1 Overview of the Astronomical Data
The time series data analyzed in this work comes from photometric astronomical
surveys. These are basically time series of intensity of light collected from various
channels like telescopes, different spectral bands or various instruments. Due to
variations in atmosphere and the sky conditions the data collected is non-uniform
in nature and is noisy. Thus the light curve data comes in three columns: time, flux
and error. The error column gives an estimation of the error of measurement in the
photometric procedure. The MACHO (MAssive Compact Halo Object) survey [1] is
operated with the purpose of searching for the missing dark matter in the galactic halo,
like brown dwarfs or planets. In MACHO the light amplification is caused by bending of
space around a heavy object due to the phenomenon known as microlensing. Current
exiting techniques mostly use Lomb-Scargle (LS) periodogram [7],[9] which is an
extension of classical periodogram techniques but it works with non-uniformly sampled
data. The estimated period given by the LS periodogram is used to ”fold” the time series
modulo the estimated value for the period so that the periodic nature of data is clearly
seen. Then, the estimated period is trimmed such that the scatter of the folded plot is
reduced. Once this is achieved it is possible to perform calculations to obtain a more
precise estimate of the period. This final step known as analysis of variance (AoV) in
10
astronomy is due to [15]. These values published by Time Series Center at Harvard
University are used as the golden standard for comparison of the various algorithms
as the final values have been manually inspected by the Time Series Center team.
This process is computationally intensive and with data being collected from billions of
astronomical objects we need a technique which is more efficient and accurate at the
same time. This inherent difficulty of the problem requires computationally intelligent
techniques [2], [3], [12], [16], [18] to solve the problem. The present work proposes two
algorithm, one using a 2D Gaussian kernel on all pairs of sample points and the other
using information theoretic approach based on correntropy [6], [10], [13], [17] with a
new spatio-temporal kernel. We will be comparing the results of the current work with
the algorithm proposed in [4] and also with the existing methods involving interpolation,
Lomb periodogram etc.
1.2 Introduction to Estimation Techniques
There are several difficulties that need to be addressed in this area. First, the
data set normally consist of samples which have been taken at non-uniformly spaced
time instants, which prevents the direct usage of Fourier transformation to study the
spectral composition of the signal. Also as the data is non-uniformly sampled correlation
cannot be directly applied either. One possible alternative is to interpolate the data
and re-sample it periodically before applying the method of choice. The presence of
gaps in the time series also creates another problem as even interpolation won’t give
accurate results in an acceptable range. This problem is avoided by simply framing
the time series data and using those frames which don’t have gaps or have very few
consecutive missing points. Generally time series data with larger time periods allow
more missing points in a frame and also the frame length is larger; whereas for data sets
with smaller time periods smaller frame length are used and fewer number of missing
consecutive samples are allowed. There is also the problem of noise as each sample
point at each time instant is accurate with certain variance. To reduce the effect of error
11
on our estimation of periodicity of data we ignore samples which have variance values
exceeding a certain threshold.
Thus in the present work two categories of techniques for periodicity estimation
have been used. In the first category we deal with methods where techniques involve
the use of interpolation. The methods in this category make use of simple Fourier
transform, correlation and a recently developed kernel based technique known as
correntropy. The second category which do not require the use of interpolation and
re-sampling but work directly on the non-uniformly sampled data are Lomb periodogram
[7] and Dirichlet transform [8]. Although framing and interpolating enable use of simple
standard techniques such as correlation or Fourier transform this method no longer
uses the original data points directly as the original information is combined with
interpolation noise. Interpolation noise along with inherent noise in the collected data
further compromises the precision of period estimation of the light curve. Hence from
an engineering point of view it is better to use samples directly for periodicity estimation
purposes rather than using interpolated data. The most popular techniques which
work on the data samples directly without involving use of interpolation are Lomb
periodogram and Dirichlet transform. As we will see later even periodogram based
techniques have drawbacks owing to the nature of the light curves. In the current work
the data set used has been obtained from eclipsing binary star systems. In these
systems there are two eclipses per cycle which gives rise to the modulation effect and
hence periodogram based methods tend to give peak at frequencies corresponding to
the sub-multiple values of the true time period. These problems are addressed by the
proposed kernel based methods.
The rest of the chapters are organized as follows: Chapter 2 deals with methods
which involves interpolation and re-sampling of data and also the techniques such as
Lomb periodogram and Dirichlet transform which work directly on the non-uniformly
sampled data. Chapter 3 introduces the concept of Correntropy and deals with the
12
algorithm proposed in [4] and a new proposed technique based on spatio-temporal
Gaussian kernel. Chapter 4 deals with the final proposed algorithm which uses
spatio-temporal kernel based correntropy on a transformed space. Chapter 5 concludes
the work and discusses the potential problems which can be addressed in the future to
further improve the period estimation techniques.
13
CHAPTER 2PERIODICITY ESTIMATION TECHNIQUES: A REVIEW
This chapter deals with some of the existing techniques which are useful for
estimation of period of astronomical light curves. These methods can be broadly divided
into two categories. The first category encompasses the techniques which need data to
be uniformly sampled in time for analysis. The second category deals with the methods
which use non-uniformly sampled data directly for analysis. For the first category of
methods to be used we need to have uniformly sampled data which we do not have.
The best way to get uniformly sampled data from non-uniformly sampled data is to use
interpolation and then resample the interpolated data at regular intervals to get uniformly
sampled data. In the first category of techniques we will use the re-sampled data for
Fourier analysis and autocorrelation for estimating the period. In the second category
of methods we use Lomb periodogram and Dirichlet transform for estimation purposes.
Before further proceeding into the implementation of these various signal processing
tools we first briefly describe the methods themselves.
2.1 Theory
This section deals with spline interpolation, Lomb periodogram and Dirichlet
transform in the said order.
2.1.1 Spline Interpolation
As the data is sampled at non-uniformly spaced time instants, to make it uniform
for analytical purposes we use interpolation [5] and then re-sample the data at
uniformly spaced intervals. There are various kinds of interpolation methods such
as linear interpolation, polynomial interpolation and spline interpolation of which spline
interpolation is the most commonly used. For experimental purposes we have used a
cubic spline interpolation to interpolate the signal from the data given. The expression
in Equation 2–1 gives the cost function which is to be minimized for interpolation of the
14
A (p=1.0) B (p=0.5)
Figure 2-1. The sampled values at non-uniformly spaced time intervals are in blue andinterpolated and uniformly re-sampled values in red for different values of p.
data.
I (p) = p
n∑
j=1
w(j)|x(j)− f (t(j))|2 + (1− p)∫|D2f (t)|2dt (2–1)
The first term under summation is the term which controls the error and the integral
term control the smoothness of the spline where ’D’ is the representation for derivative
operator. ’p’ is the smoothness parameter and as ’p’ varies from 0 to 1 the smoothing
spline changes from one extreme to another. In the summation term, ’w(j)’ represents
the importance given to the error at each sampled instant. The data set used for
simulation has specified variances at each sampled instant which gives the estimation of
error at that instant. In our case we used w(j) = 1. For interpolation purposes we use a
value of 0.5 for p which seems to strike a balance between reducing interpolation error
and increasing the smoothness of interpolated signal. Taking p as 1 gave spikes in the
interpolated signal where there are gaps as shown in Figure 2-1. It is compared to the
plot for p = 0.5.
15
Figure 2-2. Frame selection from the light curve 1.3804.164. Note that the y-axisrepresents the brightness magnitude of the star system. However thebrighter the object appears, the lower the value of its magnitude as it iscustomary in astronomy to plot the magnitude scale reversed.
The interpolated data thus obtained has been used for Fourier transform and
Auto-correlation function (ACF) to estimate the period. In both the cases first frames
were chosen from the light curve such that there were at least 100 points in the frame
16
and the gap length in the frame did not exceed 10 days (Unit of time of the data used is
days). Figure 2-2 shows the frame selection from the light curve 1.3804.164.
Then interpolation was performed as described above and then re-sampling
was done at the rate of 20 samples per day. Then in case of Fourier transform the
re-sampled data is Hamming windowed. Then N-point FFT is performed such that N is
the lowest power of 2 such that it is greater than or equal to total number of points in the
resampled data set. The peak in the FFT plot is used to estimate the time period of the
light curve by simply inverting the frequency value at the peak. In case of ACF as the
name suggests auto correlation is performed on the interpolated data. Then the largest
peak other than the peak at zero lag is identified and the lag value at that peak is the
estimated value of the time period.
2.1.2 Lomb Periodogram
The traditional methods of spectral analysis needs signal to be uniformly sampled to
work on. But Lomb periodogram [7] does not need the samples to be evenly spaced and
hence could be used directly on data for our case. It also allows examining frequencies
higher than the mean Nyquist frequency i.e. the Nyquist frequency obtained by evenly
spacing the same number of data points at the mean sampling rate. The sole reason
for using periodogram analysis is that it provides a reasonably good approximation to
the spectrum obtained by fitting sine waves by least squares to the data and plotting
the reduction in the sum of residuals against frequency. This least squares spectrum
provides the best measure of the power contributed by the different frequencies to
the variance of data and can be regarded as natural extension of Fourier methods to
non-uniform data. It reduces to Fourier power spectrum in the limit of equal spacing.
The Lomb periodogram for zero mean time series x(tn) is defined as follows;
P(ω) =1
2σ2{C(ω) + S(ω)} (2–2)
17
where
σ2 = Var(x(tn)) (2–3)
C(ω) =[∑Nn=1 x(tn) cosω((tn)− τ(ω))]2∑Nn=1 cos
2 ω((tn)− τ(ω))(2–4)
S(ω) =[∑Nn=1 x(tn) sinω((tn)− τ(ω))]2∑Nn=1 sin
2 ω((tn)− τ(ω))(2–5)
and
τ(ω) =1
2ωarctan
{ ∑Nn=1 sin 2ωtn∑Nn=1 cos 2ωtn
}(2–6)
is an offset which makes the periodogram translation invariant. In our case we
have used this periodogram formula on each of the frame without interpolation and
re-sampling. In Lomb periodogram the frequency value has been calculated till f = 2
and the inverse of the peak frequency is the calculated period of the time series.
2.1.3 Dirichlet Transform
This transform generalizes the Z-transform and is better suited for analysis in case
of non-uniformly sampled data. The Dirichlet transform [8] preserves information about
sampling instants because it does not simply consider x(tn) as sequence of samples but
as a function of time instants tn. Dirichlet transform is defined as follows;
X (p) = D[x(tn)] =
∞∑n=0
x(tn)e−ptn (2–7)
where where p is a complex variable and is defined as p = σ + jω. For uniform sampling
this becomes equivalent to the Z-transform with z = epτ where τ is the uniform sampling
period. While calculating the Dirichlet transform value till f = 2 has been calculated. To
calculate the time period of the signal the peak is identified in the Dirichlet transform and
its inverse is taken.
2.2 Results
In this section we present results obtained using four techniques. The first two
techniques Fourier transform and auto-correlation are used along with cubic spline
interpolation as described in 2.1.1. The last two techniques i.e. Lomb periodogram
18
Table 2-1. Comparative performance using interpolation based techniques, Lombperiodogram and Dirichlet transform along with results published by HarvardUniversity, Time Series Center.
Light curve Harvard Fourier Auto- Lomb Dirichletblue channel -TSC transform correlation periodogram transform1.3810.19 88.9406 44.2811 89.05 44.5217 44.52171.4411.612 45.1143 22.5986 473.6 22.5055 22.50551.4168.434 43.9301 22.1405 87.5 22.0215 22.02151.3809.1058 28.9073 14.4991 28.85 14.4225 14.42251.4652.565 27.5718 13.7681 27.65 13.745 13.7451.4288.975 17.6131 8.8086 35.25 8.7897 8.78971.4539.778 16.2502 15.7538 31.2 8.1270 8.12701.4173.1409 14.1534 14.1241 113.8 7.0865 7.08651.3449.948 14.0064 7.0621 70.05 7.0137 7.01371.4174.104 8.4929 8.5333 85.05 4.249 4.2491.4538.81 5.5343 81.92 50.0 2.7676 2.76761.3564.163 4.7155 34.1333 198.2 1.179 1.1791.3804.164 4.1875 38.0718 62.975 2.0941 2.09411.3449.27 4.0349 102.4 10.35 2.0177 2.01771.3448.153 3.2765 17.0667 16.1 3.2768 3.27681.4539.37 2.9955 68.2667 40.2 1.4982 1.49821.3442.172 1.02059 22.7556 29.9333 0.5103 0.51031.3325.93 0.95176 19.5048 20.15 0.9517 0.95171.3444.880 0.90286 19.0615 29.0250 4.708 4.7081.3447.783 0.83615 159.9390 76.96 0.7183 0.7183
and Dirichlet transform work directly on the non-uniformly sampled data as mentioned
earlier. Data set has been obtained by MACHO survey and the unit of time in the data
set is equal to a day.
For applying the techniques based on interpolation mentioned above, on a data set
first a frame of data is selected having at least 100 sample points without gaps greater
than 10 days. One thing we notice is that it is possible to obtain more than one frame
of data from each light curve data set. In those cases we simply average out over the
various values obtained from a particular technique.
Then for methods involving use of interpolation, a cubic spline is used to approximate
the light curve and then the interpolated curve is re-sampled uniformly with Fs = 20
samples per day. For Fourier analysis we simply use the uniformly sampled data
19
obtained after spline interpolation and windowed using Hamming window. The highest
peak in the spectrogram gives the peak frequency which when inverted gives the true
time period. In case of analysis using Autocorrelation the uniformly sampled data is
directly passed to the auto-correlation function. Then the plot of the output from the auto
correlation function is analyzed to obtain the highest peak disregarding the peak at zero
lag. The highest peak corresponds to the periodicity of the light curve. While performing
Fourier transform the size of transform is the least power of 2, 2k s.t. 2k−1 < L <= 2k
where L is the total number of samples in the time domain signal.
In case of Lomb periodogram and Dirichlet transform we directly use the original
samples for estimation of period. We perform both Lomb periodogram and Dirichlet
transform with a frequency resolution of 12048
. In both the cases again the peak
corresponds to the periodicity of the light curve.
The Table 2-1 gives the period values estimated for various light curves by using the
four methods described above. The values which are correct estimate of the true period
are mentioned using bold font.
In the Table 2-1 we observe that in many cases the value of the estimated period is
half of the true period. The reason behind this is that the eclipsing binary star systems
have two eclipses per cycle and hence it gives rise to modulation effect which can be
seen in the signal. The effect is similar to as shown in Figure 5-1. In this case the signal
has an cycle length of around 88 days but due to the modulation we see a trough in
about every 44 days. Also as we are using periodogram techniques which aim at fitting
sine waves into the signal and hence tend to give a peak at frequency corresponding
to twice the cycle rate which is expected. Even ACF fails due to this modulation effect.
Hence keeping in mind the drawbacks of periodogram based methods and challenges
posed by the nature of data we move towards the use of kernel based methods.
20
CHAPTER 3PERIODICITY ESTIMATION USING KERNEL BASED METHODS
The present chapter introduces the recently developed kernel based technique
known as correntropy and then presents a spatio-temporal kernel based technique.
Then an algorithm proposed in [4] utilizing interpolation and correntropy is used for
simulation purposes and compared to the newly proposed spatio-temporal kernel based
technique.
3.1 Correntropy
Correntropy is a generalized correlation function introduced in [13]. It can be
defined by inner product of vectors, which can be computed by using a positive definite
kernel function, κ, satisfying Mercers conditions is defined in Equation 3–1.
κ(xi , xj) =< φ(xi),φ(xj) >, (3–1)
where φ(xi) transforms the data xi non-linearly from input space to a high-dimensional
feature space. There are various types of kernel functions like Gaussian, spline or
sigmoid but in this particular case we have used the Gaussian kernel. The Gaussian
kernel is defined as follows;
κ(xt , xs) =1√2πσexp
{− (xt − xs)
2
2σ2
}(3–2)
This value σ is known as the kernel size and is the free parameter in the Equation
3–2. This free parameter σ is chosen from the data-set itself. For defining correntropy
a Gaussian kernel has been used. Given a random process {xt , tεT} where t denotes
time and T denotes the index set of interest, then the correntropy function is defined as
shown below;
V (t, s) = E [κ(xt , xs)] (3–3)
21
Applying Taylor series expansion to the Gaussian kernel we can express the
correntropy function as;
V (t, s) =1√2πσ
∞∑
k=0
(−1)k(2σ2)kk!
E [(xt − xs)2k ] (3–4)
To estimate a univariate correntropy function we must require that the even moment
terms are shift invariant which is a stronger condition than the wide sense stationary
condition required by correlation function. Hence to use τ as follows;
V (τ) = V (t + τ , t) =1
N
N−1∑t=0
κ(xt , xt+τ) (3–5)
strict stationary on even moments is a sufficient condition when Gaussian kernel is
used. So V (τ) is estimated by using the Equation 3–5. The variable σ in Equation 3–2
determines the emphasis given to higher order moments as compared to second order
moment. Thus correntropy is a function of two arguments similar to correlation but with
the addition of higher order moments introduced by the kernel function. As σincreases
the higher order moments decay causing the second order moment to dominate and
hence correntropy approaches correlation. Due to introduction of higher order moments
correntropy has been found to produce sharper and narrower peaks corresponding to
similarity estimation compared to simple correlation function.
Another important property which correntropy induces in the input space is a well
defined metric known as the correntropy induced metric (CIM) [14]. It is defined as;
CIM(X ,Y ) = (κ(0)− V (X ,Y ))1/2 = (κ(0)− 1N
N−1∑n=0
κ(xn, yn))1/2 (3–6)
CIM can also be thought of as the root mean squared error (RMSE) between the two
random variable in the transformed high dimensional space. For the Gaussian kernel, it
has been observed that CIM behaves likes an L2 norm when the two vectors are close,
as L1 norm outside the L2 norm and as they go farther apart it becomes insensitive to
distance between the two vectors(L0 norm). The extent of the space over which the CIM
22
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
A Contour plot
−4 −3 −2 −1 0 1 2 3 4
−4
−2
0
2
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
B Surface plot
Figure 3-1. Contour and surface plot of CIM(X,Y) with Y=0 in 2D space with a Gaussiankernel and a kernel size equal to 1.
acts as L2 or L0 norm is directly related to the kernel size σ. This is illustrated in Figure
3-1. This unique property in CIM is very useful in rejecting the outliers. In this aspect it is
different from simple correlation which provides a global measure.
Another important concept associated with that of correntropy is correntropy
spectral density (CSD). It is defined as;
P[f ] =
∞∑τ=−∞
(V (τ)− < V (τ) >)e−j2πfFs
τ (3–7)
where < V (τ) > is the mean value of correntropy. This is equivalent to Fourier transform
of the centered correntropy function.
3.2 Spatio-temporal Kernel based Proposed Method
In the previous section we defined correntropy which has been found to perform
better than auto-correlation. For applying correntropy in a similar fashion to auto-correlation
we would require uniformly sampled data. In case of non-uniformly sampled data, we
23
can perform interpolation and then re-sampling at regular intervals and then apply
correntropy as has been described in [4]. But in this method as we shall see the
original data points are never used and lots of sampled points are dropped while
finding frames of data without having large gaps in them. Hence if we could use
non-uniformly sampled data directly in a measure similar to correntropy it would be
more useful. Before proceeding to solve the problem we analyze the problem faced
while implementing simple correntropy on non-uniformly sampled data. In Figure 3-2
we see that in case of regularly sampled data for every lag value, sample point ’A’ in the
reference signal has a corresponding value in the time shifted signal which is not true
for the non-uniformly sampled data. Hence in uniformly sampled case we have pairs of
points which are passed into the kernel and the kernel output is summed over all pairs
to give the correntropy value as in Equation 3–5. Therefore in the present scenario of
non-uniform data instead of pairing each sample point with another sample value at a
fixed lag we pair each sample point with every other sample point but assign a certain
weight to each of those pairing as shown in Figure 3-2. Hence for a particular lag value
τ if two samples were sampled at time instants which are different from τ then that pair
is assigned a weight as κt(t, s + τ). We can clearly see that if the samples are exactly
spaced at interval τ then the weight assigned is maximum. Hence Equation 3–8 deals
with the new kernel output for the sample value in reference signal sampled at time ’t’.
Equation 3–9 shows the summation over all samples in the data set and finally Equation
3–10 deals with the normalization to give the expected value of the kernel output over all
samples compared to simple correntropy. In Equation 3–8,3–9 and 3–10 the left hand
side is the expression related to simple correntropy and the right hand side shows the
expression for new two dimensional kernel based technique. Also in Equations 3–8,3–9
and 3–10 κ deals with the sample values and κt deals with the time instant values.
κ(xt , xt+τ)x1→∑s
κ(xt , xs)xκt(t, s + τ) (3–8)
24
Figure 3-2. Figure illustrating the reason why simple correntropy can not be directlyused in case of non-uniformly sampled data.
∑t
κ(xt , xt+τ)x1→∑t
∑s
κ(xt , xs)xκt(t, s + τ) (3–9)
1
N
∑t
κ(xt , xt+τ)x1→∑t
∑s κ(xt , xs)xκt(t, s + τ)∑t
∑s κt(t, s + τ)
(3–10)
Equation 3–10 defines the final expression for using kernels to non-uniform data in
a similar fashion to correntropy. We use Gaussian kernel for κt which is the time kernel.
In order to implement the idea of using kernels directly to non-uniformly sampled
data we propose a new spatio-temporal kernel. It is defined on 2D vectors and the inner
product of vectors can be computed using a positive definite kernel function, κ, defined
in Equation 3–1. In the Equation 3–2 we have used a single dimensional value but in
our case we are dealing with 2 dimensional vectors. More concretely we define a two
dimensional vector h which has time value in one dimension and magnitude value in the
25
other. It is expressed as ha = [ta, xa]T and hb = [tb, xb]T . The product kernel κ is defined
as;
κ(ha,hb) = κ1(ta, tb) ∗ κ2(xa, xb) (3–11)
where κ1 and κ2 are both Gaussian kernels as defined in Equation 3–2 defined on time
(t) and magnitude (x) component of the data set respectively. This kernel is still positive
definite, being effectively a two dimensional Gaussian kernel with diagonal covariance
matrix with first diagonal component σ1 dealing with time component tk and second
diagonal component σ2 dealing with magnitude of data xk at that time instant. Thus the
new cost function is defined as follows;
K(τ) =
∑a,b κ(ha,hb + h
0τ)∑
a,b κ1(ta, tb + τ)(3–12)
where h0τ = [τ , 0].
Now the Periodicity estimation procedure is defined as follows;
1. Let H = {hk = [tk , xk ]T , 1 < k < N} where N is the total number of data pointsobtained by selecting frames of the light curve.
2. For the trial period T = τ calculate the new cost function K(τ) as defined inEquation 3–12 where (a, b) encompasses all the pairs of samples.
3. Vary the value of τ over the range 0.5 till 200 with a step size of 0.001 and repeatstep 2.
4. The value of τ which gives the first significant peak is the desired period.
A significant peak is determined as follows;
1. Denote the minimum value of the plot be denoted by Mn and maximum value byMx .
2. Dynamic range (d) = Mx −Mn3. Threshold (Th) = Mn + 0.9 ∗ d
4. Any peak which exceeds the threshold (Th) is a significant peak.
26
3.3 Kernel Size
In the present section we focus on the selection of the kernel sizes for the technique
proposed in Section 3.2. We can observe in Figure 3-3 that the peak becomes
more prominent and plot becomes more smooth by increasing σ1. We also take into
consideration the fact that for light curves which have a small time period, using a larger
value of σ1 would flatten the peak. This happens due to the reason that for smaller time
period the rate of change of magnitude value over a fixed period time is larger compared
to a data set having a larger time period. Hence for data sets with smaller time periods
many pairs which should have been given less weight due to the difference in the values
of their sampling instants, are given more weight. This distorts the plot and suppresses
the peak at the true period. So a trade-off is considered between these two opposing
factors and we have σ1 = 0.4.
For the case of magnitude kernel the value of σ2 is considered w.r.t. amplitude
dynamic range. Choosing a very large kernel size means any two magnitude values
from the corresponding vectors passed through the kernel will give similar output as
the kernel tapers very slowly. Choosing a very small kernel size would give an output
of 1 only when we have equal magnitude values and give output close to zero for any
other pair of amplitude values. This is clearly reflected in Figure 3-4 where in the plot of
σ2/(Dynamic range of magnitude) vs Trial period we see a larger kernel size gives a flat
plot having a value close to one irrespective of the assumed period value and where a
small kernel size gives a plot having value close to zero. Therefore to obtain a sharper
peak at the true period we choose σ2/(Dynamic range of magnitude) = 0.1 as the
optimum value.
For simplicity and to have the plot values restricted between 0 and 1 we drop the
normalizing factor for unit integral in the Gaussian kernel.
27
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Trial period
Light curve 1.3810.19, sigma(time)=0.1
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Trial period
Light curve 1.3810.19, sigma(time)=0.4
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Trial period
Light curve 1.3449.27, sigma(time)=0.4
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Trial period
Light curve 1.3449.27, sigma(time)=2
Figure 3-3. Plot of 2D kernel based measure with varying standard deviation values fortime kernel. In this case light curve 1.3810.19 has an time period of 88.9406days and light curve 1.3449.27 has a time period of 4.0349 days.
3.4 Results
In this section we compare the proposed technique based on 2D kernel to another
technique involving the use of correntropy and is due to [4]. This technique proposed by
[4] basically involves interpolation of data, followed by correntropy and then calculating
28
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Trial period
Sigma(Magnitude)/(Dynamic range of magnitude) = 0.01
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 0.1
Trial period
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 1
Trial period0 20 40 60 80 100
0
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 10
Trial period
Figure 3-4. Plot of 2D kernel based measure with varying standard deviation values formagnitude kernel. In this case light curve 1.3810.19 is the data set used andhas a true time period equal to 88.9406 days.
the CSD. From the CSD the peaks are identified and are used to estimate the period of
the light curve. The results are compared in Table 3-1.
In the Table 3-1 we see that both correntropy on interpolated data and the proposed
method using 2D kernels perform better than the methods described in Chapter 2. First
we observe that for the interpolation based methods in case of auto-correlation and FFT
29
Table 3-1. Comparative performance using proposed 2D kernel based technique andcorrentropy on interpolated light curve due to [4] along with results publishedby Harvard University, Time Series Center. Correctly identified values aremarked in bold.
Light curve Harvard Proposed Correntropy onblue channel -TSC technique interpolated data1.3810.19 88.9406 88.83 44.46851.4411.612 45.1143 22.89 22.48891.4168.434 43.9301 43.93 21.87461.3809.1058 28.9073 14.31 14.34891.4652.565 27.5718 27.91 13.71021.4288.975 17.6131 17.97 8.68451.4539.778 16.2502 65.03 16.20761.4173.1409 14.1534 14.06 7.04861.3449.948 14.0064 14.03 6.99941.4174.104 8.4929 16.99 4.24121.4538.81 5.5343 11.01 5.48451.3564.163 4.7155 18.89 4.61791.3804.164 4.1875 20.97 4.18781.3449.27 4.0349 4.023 4.02391.3448.153 3.2765 22.88 3.22111.4539.37 2.9955 3.0 2.95701.3442.172 1.02059 22.98 22.75561.3325.93 0.95176 20.02 215.041.3444.880 0.90286 19.182 43.88571.3447.783 0.83615 17.975 275.5491
we get only three hits each but we get seven correct identifications for the CSD based
method. Although interpolation were done on same frames of data we see that CSD
performs better. Interpolation introduces significant amount of error when the gaps in
the frame are comparable to that of the true value of the time period. This can be seen
from the fact that in Table 2-1 for the case of auto-correlation and FFT almost all the
data sets correctly identified have a larger time period. Especially auto-correlation has
all the correct identifications for light curves which have a time period greater than 25
30
days. But in case of CSD based method data sets with time periods as low as 3 days
have been correctly identified. This happens despite the fact that the maximum allowed
gap size while choosing a frame is 10 days. This clearly shows that correntropy based
method is very efficient in rejecting the outliers which in this case is the interpolated
data generated at the region of large gaps. Another interesting observation is that CSD
based method identifies the time period as half of the true value in 9 cases. This can
again attributed to the modulation effect observed in the light curves as shown in Figure
5-1.
Again in the proposed technique which uses a 2D kernel we get better performance
compared to the existing techniques as it is able to get 8 correct identifications. This
proposed method is also able to identify correctly for data sets having a time period
as low as 3 days. The use of a gaussian kernel helps in correctly identifying the useful
sample pairs while calculating the final measure. In case of 6 light curves we find that
the proposed method gives a value of the period which is a multiple of the true period.
The reason behind this is that the method is unable to find enough sample pairs which
have time difference closer to the true time period. Thus the 2D kernel based measure
does not produce peak at the true peiod but rather produces peaks at multiples of the
true period for which it is able to find sufficient number of sample pairs. In 2 cases we
see that 2D kernel based method also gives a value of period which is half of the true
period. This can be attributed to the modulation effect as explained earlier.
Although these kernel based methods perform much better than the existing
methods we still see that these two techniques fail to produce any result for data sets
having a time period less than or close to 1 day. This can be seen in the results Table
3-1 for the last four light curves. The reason behind this is that average sampling rate
in the data sets are always greater than 1. Thus we need to develop a algorithm which
would be able to detect the correct value of time period even when the average sampling
rate is more than that of the true time period of data. We have non-uniformly sampled
31
data which implies that we have information from various phases in the time period
of the light curve. Thus we need to develop an algorithm which is able to exploit this
information to its advantage to approximate a single time period of the light curve.
32
CHAPTER 4PERIODICITY ESTIMATION USING SPATIO-TEMPORAL KERNEL BASED
CORRENTROPY ON FOLDED TIME SERIES DATA
This present chapter defines a 2-dimensional kernel based correntropy and then
uses it for identification of periodicity of periodic signals and quantification of the likely
period. Before describing the steps of the proposed technique first we discuss the
idea behind this proposed technique. A periodic signal repeats itself after a fixed
interval of time. If we compare two samples which have been collected at intervals
equal to a multiple of the period of the signal then it is expected that these values are
equal in magnitude. In our case this happens rarely because, first of all, the signal is
non-uniformly sampled with gaps and there are lots of noise and modulations. But still
if we take two samples at an interval close to the multiple of the true time period then
the magnitude will be comparable too. This idea suggests that one should be folding
the observations to the principal argument of the period. Thus if we know the period we
can reconstruct one period data as x(t) = x(t + nT ) where T is the period and n is
an integer. This idea is illustrated in Figure 4-1 where the signal with a time period of
10 units and average sampling time of 1 unit is used to reconstruct a single period. If
we fold the data using a value of T which is not a multiple of the true period then the
actual signal would not be obtained. It is easy to see that the period T will yield the
smoothest representation in the principal argument domain where as a value which is
not an integral multiple of T will yield a noisy representation as illustrated in Figure 4-2.
Therefore one needs to find a methodology to compare the similarity of the samples
both in time and in amplitude, which will be implemented with a two dimensional kernel.
We can see how we can create a single period of the signal by knowing the true period.
Unfortunately this method is greedy, and many different trial period value needs to be
evaluated to obtain the period for which the similarity is the highest.
More concretely, we define a two dimension vector h which has time value in one
dimension and magnitude value in the other. It is expressed as ha = [ta, xa]T and
33
Combining into one frame
Figure 4-1. Reconstruction of single period of the signal by breaking the original signalinto frames of length equal to the true time period of the signal.
hb = [tb, xb]T . The product kernel κ is defined as;
κ(ha,hb) = κ1(ta, tb) ∗ κ2(xa, xb) (4–1)
where kappa1 and kappa2 are both Gaussian kernel as defined in Equation 3–2 defined
on time (t) and magnitude (x) component of the data set respectively. This kernel is
still positive definite, being effectively a Gaussian kernel with diagonal covariance matrix
34
A B
Figure 4-2. Folding performed on a non-uniformly sampled signal with true period equalto 1 unit. Folding has been performed with trail period equal to 1 unit and 1.3units.
with first diagonal component σ1 dealing with time component tk and second diagonal
component σ2 dealing with magnitude of data xk at that time instant. Using the newly
defined kernel the correntropy equation is defined as follows;
V =1
N − 1N−1∑
i=1
κ(hi,hi+1) (4–2)
where hi is an ordered sequence of vectors.
The Section 4.1 deals with the proposed technique for estimation of time period of
the signal.
4.1 Period Estimation
The algorithm for the period T estimation is as follows;
1. Let H = {hk = [tk , xk ]T , 1 < k < N} where N is the total number of data pointsobtained by selecting frames of the light curve.
35
2. For the trial period T = p, the transformation φp on H is such that φp(H) = Y
where Y = {Ψk = [τk , xk ]T , 1 < k < N} s.t. τk = (tk −[tkp
]p)/p where [·] is floor
function.
3. Then we order the transformed vectors such that Ψki precedes Ψki+1 if τki <= τki+1.If τki = τki+1 we order the amplitudes s.t. xki <= xki+1
4. Calculate correntropy with the 2D kernel V (p) as Equation 4–2
5. Calculate correntropy with the time kernel only as a normalizing factor U(p) asU(p) = 1
N−1∑N−1i=1 κ1(τki , τki+1)
6. Vary the value of p over a range and repeat from step 2 to step 4.
7. The value of p which gives the first significant peak in the plot of V (p)/U(p) is thedesired period
A significant peak is determined as follows;
1. Denote the minimum value of the plot be denoted by Mn and maximum value byMx
2. Dynamic range (d) = Mx −Mn3. Threshold (Th) = Mn + 0.9 ∗ d
4. Any peak which exceeds the threshold (Th) is a significant peak.
In the above algorithm the range depends on some apriori knowledge of the periods
of interest. The range of 0.5 to 200 is used and step size of 0.0001 is used. For lower
values i.e. for values less than 2 step size of 0.00001 is used. The reason behind
different step sizes is that for lower values of p a small deviation in the estimated period
value can give noisy period reconstruction as the number of cycles is larger in the given
time interval. This is explained in Appendix.
4.2 Kernel Size
In this section we look into the selection of the kernel sizes. The value of σ1 is
considered w.r.t. average sampling period (determined by dividing the time interval over
which all vectors are spread by total number of vectors) for choosing an appropriate
value of standard deviation for the time kernel. We can observe in Figure 4-3 that the
36
peak becomes more prominent by increasing σ1 ∗ (Average sampling rate) and we also
take into consideration the fact that consecutive vectors in time passed through the
kernel should be given more importance as compared to vectors which are far apart
from each other in time. Giving more weight to consecutive vectors is especially more
significant as we are trying to measure similarity between vectors which are consecutive
in time, after the transformation of the 2D vectors during the implementation of our
proposed technique. To give more importance to vectors which are closer in time we
need to reduce the kernel size. So a trade-off is considered between these two opposing
factors and we have σ1 ∗ (Average sampling rate) = 1. One thing to be noticed is that the
average sampling rate is always fixed for all values of assumed period while scanning
over a range because in the proposed technique we scale all the 2D vectors in the time
range 0 − 1 after performing the modulo operation and the total number of vectors is
fixed.
Similarly for magnitude the value of σ2 is considered w.r.t. amplitude dynamic
range. Choosing a very large kernel size means any two magnitude values from the
corresponding vectors passed through the kernel will give similar output as the kernel
tapers very slowly. Choosing a very small kernel size would give an output of 1 only
when we have equal magnitude values and give output close to zero for any other
pair of amplitude values. This is clearly reflected in Figure 4-4 where in the plot of
σ2/(Dynamic range of magnitude) vs Trial period we see a larger kernel size gives a flat
plot having a value close to one irrespective of the assumed period value and where a
small kernel size gives a plot having value close to zero. Therefore to obtain a sharper
peak at the true period we choose σ2/(Dynamic range of magnitude) = 0.1 as the
optimum value.
For simplicity and to have the plot values restricted between 0 and 1 we drop the
normalizing factor for unit integral in the Gaussian kernel.
37
0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Time)x(Avg. sampling rate) = 0.01
Trial period
0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Time)x(Avg. sampling rate) = 0.1
Trial period
Co
rre
ntr
op
y v
alu
e
0.5 1 1.50
0.2
0.4
0.6
0.8
1
Trial period
Co
rre
ntr
op
y v
alu
e
Sigma(Time)x(Avg. sampling rate) = 10
0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Time)x(Avg. sampling rate) = 1
Trial period
Figure 4-3. Plot of correntropy of transformed space with varying standard deviationvalues for time kernel.
4.3 Results
In this section the results has been presented for 20 data sets and it is compared
with results published by Harvard University, Time Series Center which has been used
as the standard for comparison.
This algorithm shows a significant deal of improvement over all the previously
mentioned methods for estimation of time period. This method uses correntropy based
38
0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 0.01
Trial period0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 0.1
Trial period
0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 10
Trial period0.5 1 1.50
0.2
0.4
0.6
0.8
1Sigma(Magnitude)/(Dynamic range of magnitude) = 1
Trial period
Figure 4-4. Plot of correntropy of transformed space with varying standard deviationvalues for magnitude kernel.
on 2D kernel and hence is able to exploit the advantage provided by the kernel based
methods in rejecting the outliers. Also this correntropy is performed on folded time
series data which makes it robust to the effects of average sampling rate as compared
to interpolation based methods or the kernel based methods described earlier. In Table
4-1 we see that the proposed algorithm gives 15 correct identifications and for the rest 5
cases it gives a value equal to half of the true period.
39
Table 4-1. Comparative performance using proposed correntropy based technique alongwith results published by Harvard University, Time Series Center. Correctlyidentified values are marked in bold.
Light curve Harvard 2D correntropyblue channel -TSC based technique1.3810.19 88.9406 44.50171.4411.612 45.1143 45.14411.4168.434 43.9301 43.93131.3809.1058 28.9073 14.45461.4652.565 27.5718 27.57481.4288.975 17.6131 17.61161.4539.778 16.2502 16.25081.4173.1409 14.1534 14.15091.3449.948 14.0064 14.00591.4174.104 8.4929 8.49281.4538.81 5.5343 5.53441.3564.163 4.7155 4.71561.3804.164 4.1875 4.18761.3449.27 4.0349 4.03471.3448.153 3.2765 3.27641.4539.37 2.9955 1.49771.3442.172 1.02059 0.51031.3325.93 0.95176 0.951761.3444.880 0.90286 0.902861.3447.783 0.83615 0.41807
The robustness to average sampling rate can be seen from the fact that for the last
four light curves with time periods close to 1 day the method gives 2 hits and for two
cases it gives half the value of true time period. The 5 cases where we get estimated
values as half of the true time period, can be attributed to modulation effect as described
in earlier chapters. However an interesting thing to note is that in all these five cases
the peaks at the true time period were larger than the peaks at half the value. Thus if in
some way we can fine tune the threshold then perhaps we would be able to get 100%
accuracy.
40
Although this method performs better than the rest of the techniques described
earlier it does require a larger computation time compared to the other methods due
to the folding in time involved at each trial value. Thus another important area to look
into is to get rough estimate of the time period using a faster and relaible technique and
then using the proposed method to correctly identify the period over a smaller range of
values.
41
CHAPTER 5CONCLUSION
In Chapter 2, 3 and 4 in many cases we see that the identified peak is at a value
which is half of the true period. The reason for getting a peak at sub-multiple of true
period is due to the shape of the signal which can be seen in Figure 5-1. We see the
modulation effect inside a period which is responsible for the peak at a value which is
half of the true period. This modulation effect tends to affect the periodogram techniques
such as Lomb periodogram, Dirichlet transform the most, as these methods try to fit sine
waves into the data. These methods tend to give a result in terms of the fundamental
frequency rather than looking at the actual number of periodic cycles per unit time. Out
of the 20 data sets in 15 instances Lomb periodogram and Dirichlet transform identify
the time period as half of the true time period. This can be seen in Table 2-1. This is
also seen in the scenario when FFT is used on interpolated data where in 7 instances
the detected period is half of the true period.
As we are dealing with eclipsing binary star systems and the magnitude waveform
is as shown in Figure 5-1 fundamental frequency identified by the spectral methods
almost always turns out to be twice the period cycle rate. In Chapter 2 interpolation
based methods have the added disadvantage that for data sets with smaller time period
the average number of samples available per period for interpolation is less and hence
the quality of interpolation is affected to a great extent. This effect can be seen by the
fact that as the time period of the data set used for testing decreases the interpolation
based methods tend to produce more erroneous results. This can be seen from the fact
that in case of FFT and Autocorrelation methods for the first 10 data sets we get three
correct identification but for the final 10 data sets which have smaller time periods we
get no correct identification.
In Chapter 3 the new proposed method gives 8 correct estimations whereas the
method proposed in [4] gives 7 hits. Again one thing to observe here is that for the 4
42
0 20 40 60 80 100 120 140 160 180 200
−9.9
−9.8
−9.7
−9.6
−9.5
−9.4
Figure 5-1. Magnitude plot of light channel 1.3810.19. Note that the y-axis representsthe magnitude of the star. Magnitude measures the brightness of a celestialobject, however the brighter the object appears, the lower the value of itsmagnitude. It is customary in astronomy to plot the magnitude scalereversed.
data sets which have time period close to one day neither of the method actually gives a
correct estimation. The problem arises as the data sets on an average have one sample
per period. This makes it difficult to either interpolate the samples or estimate period.
In Chapter 4 the proposed method uses folding to reconstruct a single period. Thus
even though data sets have fewer samples per period on an average it does not affect
the method. Hence this proposed technique even estimates with very high degree of
accuracy the period of data sets having time period close to or less than a day. In fact
it is able to give 15 accurate estimates and the remaining 5 give values which are half
of the true period. If we compare this method with the methods described earlier, the
43
0 2 4 6 8 10 12 14 16 18 200.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
Trial period
Light curve 1.3448.153
Figure 5-2. Plot of spatio-temporal kernel based correntropy for light curve 1.3448.153.
0 10 20 30 40 50 60 70 80 90 1000.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Trial period
Light curve 1.3810.19
Figure 5-3. Plot of spatio-temporal kernel based correntropy for light curve 1.3810.19.
44
Table 5-1. Performance evaluation of the existing techniques and the roposedtechniques. The results published by Harvard University, Time Series Centerhas been used as the golden standard for the evaluation.
Index Method used Correct Average absoluteidentification relative error for(20 light correctly identifiedcurves used) time period
1 FFT on interpolated data 3 0.012462 Auto-correlation on 3 0.00202
Interpolated data3 Lomb periodogram 2 0.000084 Dirichlet transform 2 0.000085 CSD on interpolated data 7 0.005816 2D kernel based measure 8 0.00927
(Proposed)7 2D kernel based correntropy 15 0.00009
on folded time series data(Proposed)
value of the estimate given by the 2D kernel based correntropy technique are more
accurate. Another thing to note here is that peaks obtained in this case are very sharp
especially for data sets with a smaller time period. This can be easily seen in Figure 5-2
which has a sharper peak as compared to that of Figure 5-3. Thus we can see that the
spatio-temporal kernel based correntropy method is superior to the existing methods
not only in terms of number of hits where it estimates the period correctly but also in the
degree of accuracy of those hits. This can be seen from the Table 5-1.
Future Directions
We see that for methods based on interpolation the average number of samples in
a period affects the performance of the procedure. In these methods we have used fixed
allowable gap size i.e. 10 days when selecting frames but in case of light curves with
smaller time periods especially those with time period less than 10 days it will severely
impair the interpolation and hence cause most of the methods to fail. Thus to improve
45
perhaps we can use an adaptive allowable gap size in the data set. Thus light curves
with smaller time period will use a smaller gap size compared to those with larger time
period.
The periodogram based techqniues suffer from the fact that they tend to fit
sinusoids to the data sets and hence give the time period corresponding to the
fundamental frequency. One observation that can be made here is that periodogram
based techniques are simply using sine waves as the basis function to capture the
information content of the data set. Later perhaps we can look into developing a new set
of basis functions using the fuzzy knowledge about the shape of the curves. This will
cause the method to be more robust to the modulation effects.
The first proposed method which uses a spatio-temporal kernel also fails especially
for data sets with smaller time period. This is because of the fact that this method uses
a fixed kernel size for all lag values, but for data sets with smaller time periods we need
the kernel size to be sensitive to the small changes in the time difference values in the
sample pairs. The reason is that for light curves with smaller time periods for a specific
shape the rate of change is higher. Thus a smaller kernel size would be preferable.
In the second proposed technique using spatio-temporal based correntropy
although the performance shows a significant improvement over previously existing
methods although it still fails in few cases due to the modulation effect. To counter the
modulation effect we have used the concept of a significant peak in Section 4.1 while
describing the procedure. Although we have used a adaptive threshold for various data
sets to identify significant peaks the fraction of dynamic range is used is fixed. This
actually depends on the amount of modulation. This also holds for the first proposed
method based on spatio-temporal kernel. We need to look into a way to identify the
degree of modulation or in other words the shape of the curve. Another drawback
is that we have to scan over a fixed range of values to identify the true period. Thus
development of an efficient algorithm to detect the range of values or the order of the
46
time period is worth looking into. This will in turn reduce the number of computations.
The amount of improvement in computation time would be significant as for each
value of the trial period we perform folding in time thus considerably slowing down
the algorithm. Fewer number of trial periods would thus reduce computation time
significantly.
Thus we see that in all the interpolation based techniques and the proposed
methods in some way we need to have a rough estimate of the time period value
to make the all these procedures more adaptive and thus perform better. In case of
periodogram based methods we need to be able to develop basis function specific to
the light curve to use instead of pure sinusoids. Also the proposed methods need the
knowledge of shape of the curve to some extent to further improve their performance.
47
APPENDIX: VARIABLE STEP SIZE
This chapter deals with the step size used while scanning over a range of values
to implement the period estimation algorithm described in Chapter 4. The variable
step size is necessitated by the fact that using a smaller step size increases the
computational complexity but using a larger step size especially might cause the
algorithm to miss the peak if true period of the light curve is small. If periodicity of the
light curve is large then we can afford to use a larger step size while scanning the range
of values without missing the peak corresponding to the time period. This effect can be
seen in the Figure 5-2 and 5-3. For light curve 1.3448.153 which has a time period equal
to 3.2764 days we see peaks are much sharper and for light curve 1.3810.19 which has
a time period equal to 88.94063 days the peaks are wider. This means we can afford
to use larger step size for light curve 1.3810.19 and yet be able to identify the peaks
whereas we cannot use a larger step size for light curve 1.3448.153 without the risk of
missing out the peak. We present below a proof on how choosing a larger step size can
cause the algorithm to fail.
Let, Number of days over which light curve data is collected = T True period
(unknown) =p Trial period = q (variable for our experiment) Now (say) step size being
used in neighborhood of p = r
This value r is the resolution being used to scan the range of values in neighborhood
of p Error is introduced as the true period may not be present in the set of trial period
values. This error is minimized when a trial period closest to the true value of the period
is used while scanning over a fixed range. Let the error = ε It easy to see that |ε| < r/2During the folding process number of times the period is folded T
p+ε
Accumulated error over all the periods = Tp+ε
∗ ε Phase shift from first period through
last period is;
fracTp + ε ∗ |ε|p≈ 2π|ε|T
q2<2π(r/2)T
q2(A–1)
48
(p >> ε has been assumed which holds true for our experimentation purposes and
q = p + ε) This phase shift needs to be minimized to get a proper reconstruction of the
period if q happens to be the closest value to the true period. This can be controlled by
varying resolution r over the whole range. We choose r such that phase shift over all
the periods during the folding process is just small fraction of complete cycle i.e. period
reconstruction is not noisy.
Hence we set this allowable phase shift as;
0.005 ∗ 2π = 0.01π (A–2)
From Equation A–1 and A–2 we get 2π(r/2)Tq2
= 0.01π Solving we get;
r =0.01q2
T(A–3)
Thus step size r is determined about a value by Equation A–3 Hence in the
algorithm a smaller step size is used while scanning over smaller trial period values
and a larger step size for large trial period values.
49
REFERENCES
[1] C. Alcock, R.A. Allsman, D.R. Alves, T.S. Axelrod, A.C. Becker, D.P. Bennett, K.H.Cook, N. Dalal, A.J. Drake, K.C. Freeman, M. Geha, K. Griest, M.J. Lehner, S.L.Marshall, D. Minniti, C.A. Nelson, B.A. Peterson, P. Popowski, M.R. Pratt, P.J. Quinn,C.W. Stubbs, W. Sutherland, A.B. Tomaney, T. Vandehei and D. Welch, ”The MACHOProject: Microlensing Results from 5.7 Years of LMC Observations,” AstrophysicalJournal, vol. 542, pp. 281-307, 2000.
[2] B. E. Boser, I. M. Guyon and V. N. Vapnik, ”A Training Algorithm for Optimal MarginClassifiers,” Proceedings of the 5th Annual ACM Workshop on COLT, pp. 144-152,Pittsburgh, USA, 1992.
[3] J. Debosscher, L. M. Sarro, C. Aerts, J. Cuypers, B. Vandenbussche, R. Garrido andE. Solano, ”Automated Supervised Classification of Variable Stars. I. Methodology,”Astronomy and Astrophysics, vol. 475, pp. 1159-1183, December, 2007.
[4] Pablo A. Estevez, Senior Member, IEEE, Pablo Huijse, Pablo Zegers, SeniorMember, IEEE, Jose C. Principe, Fellow Member, IEEE, and Pavlos Protopapas,”Period Detection in Light Curves from Astronomical Objects Using Correntropy,”IJCNN, July 18-23, 2010.
[5] Prabhu Babu and Petre Stoica, ”Spectral analysis of non-uniformly sampled data - areview,” Digital Signal Processing, 2009, Elsevier.
[6] Jian-Wu Xu, Puskal P. Pokharel, Antonio R.C.Paiva and Jose C. Prıncipe,”Non-Linear Component Analysis based on Correntropy,” IJCNN, July 16-21,2006.
[7] N.R. Lomb, ”Least Square Frequency Analysis of Unequally Spaced Data,” Astro-physics and Space Science, vol. 39, Feb. 1976, p. 447-462.
[8] Andrzej Wojtkiewicz and Michai Tustytiski, ”Application of the Dirichlet Transformin Analysis of Non-Uniformly Sampled Signals,” Proceeding of the internationalconference on Acoustic, Speech and Signal Processing. p. V.25 - V.28, 1992.
[9] J. D. Scargle, ”Studies in Astronomical Time Series Analysis. II. Statistical Aspectsof Spectral Analysis of Unevenly Spaced Data,” The Astrophysical Journals, vol. 263,pp. 835-853, December, 1982.
[10] A. Gunduz and J. C. Prıncipe, ”Correntropy as a Novel Measure for NonlinearityTests,” Signal Processing, vol. 89, pp. 147-23, 2009.
[11] M. Petit, Variable Stars (New York: Wiley), 1987.
[12] P. Protopapas, J. M. Giammarco, L. Faccioli, M. F. Struble, R. Dave and C. Alcock,”Finding Outlier Light Curves in Catalogues of Periodic Variable Stars,” MonthlyNotices of the Royal Astronomical Society, vol. 369, pp. 677-696, June, 2006.
50
[13] I. Santamarya, P. P. Pokharel and J. C. Prıncipe, ”Generalized Correlation Function:Definition, Properties, and Application to Blind Equalization,” IEEE Transactions onSignal Processing, vol. 54, no. 6, pp. 2187-2197, June, 2006.
[14] W. Liu, P. P. Pokharel, and J. C. Prıncipe, ”Correntropy: Properties and applicationsin non-gaussian signal processing,” IEEE Transactions on Signal Processing, vol. 55,no. 11, pp. 52865298, 2007.
[15] A. Schwarzenberg-Czerny, ”On the advantage of using analysis of variance forperiod search,” Monthly Notices of the Royal Astronomical Society (MNRAS) , vol.241, pp. 153-165, 1989.
[16] G. Wachman, R. Khardon, P. Protopapas and C. Alcock, ”Kernels for Periodic TimeSeries Arising in Astronomy,” Proceedings of the European Conference on MachineLearning, Lecture Notes in Computer Science, Vol. 5782, pp. 489-505, 2009.
[17] J.-W. Xu and J. C. Prıncipe, ”A Pitch Detector Based on a Generalized CorrelationFunction,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16,no. 8, pp. 1420-1432, November, 2008.
[18] T.-F. Wu, C.-J. Lin and R. C. Weng, ”Probability Estimates for Multi-ClassClassification by Pairwise Coupling,” Journal of Machine Learning Research,vol. 5, pp. 975-1005, 2004.
51
BIOGRAPHICAL SKETCH
Bibhu Prasad Mishra was born in India in 1987. He did his schooling in Rourkela
before joining engineering school. He started his engineering on August,2005 at IIT
Kharagpur. In 2009 he graduated with Honors and received his Bachelor of Technology
(B.Tech) in Electronics and ECE. Upon graduation he joined University of Florida to
pursue Master of Science degree in Electrical and Computer Engineering. He has been
working with Dr. Prıncipe in Computational NeuroEngineering Laboratory (CNEL) since
Spring 2010. He received his Master of Science degree in Department of Electrical and
Computer Engineering in University of Florida in 2011.
52