Time-Series Data Analysis
Statistical Methods Fuzzy Logic in Time-Series Data Analysis Chaos Theory in Time-Series Data Analysis Application of Artificial Neural Networks
to Time-Series Data Analysis
Goal of Time-Series Data Analysis
1) interpret data (discover the pattern of the data);
2) forecast (predict future values).
Main features of a time series
Trend component --- a long-term change that does not repeat at the time range being considered
Seasonal component(seasonality)--- regular fluctuations in systematic intervals.
Noise (error)---irregular component
Measurement of the Fitness of Forecasting Model Plot:Plot the forecast values against the observed values(can show in which
region the model fits, and in which does not) Mean Error: the average of the error, which is the difference of one-period-ahead
forecast, and the current observed value(cancel positive and negative errors out) Mean Absolute Error: the average of the absolute errors(not have the drawback
of Mean Error) Sum of Squared Error: adding squared error together Mean Squared Error: It is the average of squared error(outliers may have greater
weight) Percentage Error: the relative magnitude of the error to the observed value. PEt =
100(Xt –St) / Xt
Mean Percentage Error:the average of percentage errors Mean Absolute Percentage Error:the average of absolute percentage error
Statistical Methods
Exponential smoothing ARIMA(AutoRegressive Integrated
Average Model)
Statistical Methods- Exponential smoothing) Smoothing technique---used to reduce noise,
also remove seasonality to make trend component salient
Exponential smoothing---a tool for noise filtering and forecasting
simple exponential smoothing--- the best model for one-period-ahead forecasting among 25 time series methods(Markridakis et la )
Statistical Methods- Exponential smoothing) formula of the exponential smoothing: Xt = b + t
St = Xt + (1 - )St-1
Where t: error
b: constant Xt: observed value at time t
St: smoothed value at time t
Statistical Methods- Exponential smoothing) smoothed value at time t---weighted average of observed
value at time t and smoothed value at time t-1 smoothed value at time t-1---weighted average of observed
value at time t-1 and smoothed value at time t-2 the current smoothed value---weighted average of all
previous values(plug in all the Si )
weight of the previous observed value--- decreases with time t exponentially. The earlier the observed value, the smaller the weight it has
Statistical Methods- Exponential smoothing) = 0---the current observed value is
ignored; Current smoothed vaule= the initial smoothed value (recursively using the formula, get the same smoothed value for all t)
= 1---no smoothing is applied choose ---1) between 0 and 1; 2) make
the sum of squares of residuals the smallest
Statistical Methods- Exponential smoothing S0(initial smoothed value )--- has great
influence on forecasts( is close to zero) --- has little effect (lots of observed values
before the forecasting) Choose S0 --- the model produce best results
Statistical Methods-ARIMA
ARIMA(AutoRegressive Integrated Average Model)
Box and Jenkins (1976) interpret and forecast complex(to get satisfactory results, the researcher
must have a great deal of experience) ARIMA(p,d,q),which means it has p
autoregressive parameters, q moving average parameters after it is differenced d times
Statistical Methods-ARIMA
Autoregressive process Each value is viewed as decided by the previous data: Xt = + 1Xt-1 + 2Xt-2 + … + Where Xt : observed value at time t
: constant I: parameters
: noise
Statistical Methods-ARIMA
Moving average process each value is viewed as decided by current and
previous noises.
Xt = + t - 1t-1 - 2t-2 - …
Where : constant I: parameter
Statistical Methods-ARIMA
Difference---a procedure that transform each I’th value into its difference from the (I-k)’th value. K is the lag, which is the length of seasonality, that is , the time series data repeat itself every K length of time.
---removes some seasonality, so that other seasonality or the trend components is clearer
Statistical Methods-ARIMA
Empirically, one of the five following models yields good results
1)one autoregressive parameter 2)two autoregressive parameters 3)one moving average parameter 4)two moving average parameters 5)one autoregressive parameter and one
moving average parameter
Fuzzy Logic
An alternative to traditional notions of set membership and logic
1. Truth values or membership values are indicated by a value on the range[0.0, 1.0] with 0.0 representing representing absolute falseness and 1.0 absolute truth.
2. Hedge: modifier of fuzzy values:’very’,’more or less’,’somewhat’,’rather’
Fuzzy Logic
traditional logic(Boolean Logic) or membership of a set---has only two values: 0 or 1 (or true or false) {“ Law of the excluded Middle”---things can not be both true and false simultaneously}
Example---Jack’s height is 185cm, can we say that Jack is tall? Consider the mean height of people in the U.S., Jack is tall; but if Jack is a basketball player, he is deemed to be short
Fuzzy Logic
1965, Lotif A. Zadeh published “Fuzzy Sets(the “Father of Fuzzy Logic”)
Zadeh defined fuzzy logic as “the logic of approximate reasoning with (traditional) precise reasoning as the limiting case.” --crisp sets(traditional sets are included in fuzzy sets; they are just extremes)
Fuzzy Logic
membership function---map set to [0 , 1], 0 represents absolute falseness; 1 represents absolute truth
example--- the set old. If Jack is 100 years old, he might be assigned 1; if he is 41 years old, he might be assigned 0; Jack might be assigned 0.8 if he is 75 years old. Those values are called truth values or membership values. And the membership function for the last situation might be written as mTALL(Jack)=0.8.
Fuzzy Logic
difference between fuzzy logic and probability---in fuzzy logic, 0.8 means the degree that Jack belongs to the set TALL; probability 0.8 means that there is a 80% chance that Jack belongs to the set TALL
Fuzzy Logic
hedges or modifier --- describe “very”, “somewhat”, “rather”, “more or less”.
the definitions of hedges are subjective.(as the membership values)
Example--- “very” is usually defined as m”VERY”A(x) = mA(X)^2. Thus, mVERYTALL(Jack) = 0.8*0.8 = 0.64.
Fuzzy Logic
uncertain (not random) factors---imperfectness in data collection, wars, fluctuations of consumer prices, stock prices and sales revenues...
Fuzzy Logic
DataX software suite(Zaptron Systems) the average prediction accuracy is 97% X(t) = Fl(t)+Fc(t)+fu(t)+e(t) Where Fl(t) :trend componet Fc(t): seasonality Fu(t) : fuzziness e(t) : error
Fuzzy Logic
Example:predict the purchase trend and customer’s satisfaction level
Fuzzy Logic
Consumption utility function---a mathematical description of the satisfaction level of a consumer for consumption, takes value from [0 , 1]
consumption behaviors---determined by both physical properties of the goods and personal factors of the consumers. Physical properties include quality, appearance, material, etc. of the goods. Personal factors includes a person’s preference, psychological factors, etc. For example, the consumption utility of a hamburger to a hungry person may be 1, while that to a person with a full stomach may be 0.
A fuzzy concept
Fuzzy Logic
U = w1*U1+w2*U2+…+wN*UN Ui is the utility of ith goods For a certain situation, Fi is the map of spending on ith goods to [0 , 1] Xi: the spending on ith goods Ui = 1 if Xi = ; Ui = 0 if Xi = 0 Ui must be increasing with respect to Xi, and the marginal increasing should be decreasing. Fi is determined by the parameters Di, Pi, Si, C Di a parameter determined by the consumer’s subjective evaluation of the consumption Pi price of the ith goods Qi = Di * Pi , consumer’s subjective measure on the value of a cosumption
Si percentage of spending on ith goods Si = Ai+Bi/C Ai limit spending percentage of the ith goods in total spending; limit spending occurs when one’s spending
is big enough; i.e. he or she can spend as much as he or she like without the constraint of resource. Bi the trend of change in Ai caused by change in personal income
Ci total amount spent on N goods
Fuzzy logic
Data X software suite has special modules to compute { Di , Qi , Si , Ai , Bi , Wi } based on maximum utility principles.
Raw data---annual average consumption for 5 categories (food, clothing, energy, housing, supplies, and entertainment) from 1980 to 1984, by consumers in a remote rural area
Assumption--- annual spending rate increases by 8% and population increases by 0.9%
Forecast--- purchase trend and consumer’s satisfaction level for the years 1985, 1990, 1995 and 2000
Chaos Theory
Chaos--- describes unpredictable behaviors cause---highly sensitivity to initial conditions(high divergence of the
system though two starting points are arbitrarily close starting points) --- Each disturbance in a system may be small, but the change it
creates may increases immensely with time(“butterfly effect”-- a flapping of the wings of a butterfly in Beijing, China will eventually cause minor weather change in New York, America)
Impossible--- to predict the exact state of a system Possible--- model the whole system (Chaos theory analyzes the order
inherent in the system) Edward Lorenz--- long-term predict of whether is
impossible( “butterfly effect” )
Chaos Theory
Attractor--- a set of states that the system will eventually “settle down to “ (a single point, a infinite number of points,can have as many parameters as the variables of the system)
Basin of an attractor--- a region of starting points that go to this attractor
Repellor--- a point the system moves away from Saddle point--- a point that is an attractor from
some regions and a repellor to other regions
Chaos Theory
Return map --- a plot of the data at time t against t+delta(t)
Chaos Theory---Logistic Map in the chaotic region, r=3.99
Chaos Theory---Return Map from Logistic Map, r=3.99
Artificial Neural Networks
First appear-early 1950’s John Hopfield---1982, flourish of neural network ---a certain kind of highly interconnected network
could be considered as a dynamic system containing “energy”. The process that the network started from some random states and went to some stable states was just like that of an unstable system eventually went to a state of minimal energyvery efficient and thus led to the
Artificial Neural Networks
Artificial neural network system --- an assembly of connected simple processing elements
cells or neurons---these elements internal links ---interconnections The structure of a neural network system is a
simplified structure of a brain of an animal, hence the name neural network
Artificial Neural Network
A neural network--- a mathematical function that computes an output based on a set of input values
--- each neuron involves a mathematical function that uses weights to convert inputs to an output value
--- each neuron can has only one output at a certain time, though that output may broadcast to several other neurons at the same time
Artificial Neural Network
y = b+w1x1+w2x2+…+wnxn
x1, x2, …, xn --- the signals
w1, w2, … , wn --- corresponding weights
b --- bias (change the output independently of the inputs)
activation function --- convert the y into an output: z = f(y)
---Usually, logistic sigmoid function is used: f(y) = 1/(1+exp(-y))
Artificial Neural Network
training of the network --- the weights are adapted by the input data set (error back-propagation learning method )
--- small changes of an input signal will not change the output of a neuron dramatically
--- changes of a weight will only affect output of a certain number of input patterns
Artificial Neural Network
End of training --- the error is below a certain level
--- perform training for a certain number of epochs (an epoch is a full pass through the training set)
Overfit ---occurs when a neural network is trained too much on a certain data set, the neural network has low generalization, neural network follows the training data too much, and can not adapt to new input data patterns correctly
Artificial Neural Network
Three components: (1) architecture: determine how the neurons are
connected; (2) learning algorithm: the method of determining
the weights and the bias for each neuron; (3) activation function: mathematical functions
that determine how the input is transformed to output.
Artificial Neural Network
Only neurons in adjacent layers may communicate between each other, not neurons in the same layers
The network is connected to the outer by input or first layer and the output or last layer
Layers that are not input and output layers are referred to as hidden layers, because they are not visible by the outside.
Artificial Neural Network
feedforward neural network --- data do not go back to former neurons (data go only one-way)
recurrent neural network --- data will be processed by former neurons again (data traveled in both direction)
--- the states of a recurrent neural network will change until a equilibrium is reached
Artificial Neural Network
fixed weights --- no learning involves; supervised learning --- each input vector is
associated with a target output vector; unsupervised learning --- no target output
vector is specified
Artificial Neural Network
For a time-series data set, in order to use neural network technique for forecasting, the data set is divided into two subsets: one is considered as old data, and are used to train the neural network; the other is used for further forecasting
Artificial Neural Network
Attrasoft --- designed software based on Hopfield Neural Model and the Boltzmann Machine for stock prediction and lottery forecast
Chase Manhattan Bank --- implemented Creditview--a neural network to reduce risk on loans to public and private corporations. This system forecasts the loan risk of a company for three years
Alela Corp --- developed a method to predict stock market index based on neural network
Top Related