Download - Time-Series Data Analysis n Statistical Methods n Fuzzy Logic in Time-Series Data Analysis n Chaos Theory in Time-Series Data Analysis n Application of.

Time-Series Data Analysis

Statistical Methods Fuzzy Logic in Time-Series Data Analysis Chaos Theory in Time-Series Data Analysis Application of Artificial Neural Networks

to Time-Series Data Analysis

Goal of Time-Series Data Analysis

1) interpret data (discover the pattern of the data);

2) forecast (predict future values).

Main features of a time series

Trend component --- a long-term change that does not repeat at the time range being considered

Seasonal component(seasonality)--- regular fluctuations in systematic intervals.

Noise (error)---irregular component

Measurement of the Fitness of Forecasting Model Plot:Plot the forecast values against the observed values(can show in which

region the model fits, and in which does not) Mean Error: the average of the error, which is the difference of one-period-ahead

forecast, and the current observed value(cancel positive and negative errors out) Mean Absolute Error: the average of the absolute errors(not have the drawback

of Mean Error) Sum of Squared Error: adding squared error together Mean Squared Error: It is the average of squared error(outliers may have greater

weight) Percentage Error: the relative magnitude of the error to the observed value. PEt =

100(Xt –St) / Xt

Mean Percentage Error:the average of percentage errors Mean Absolute Percentage Error:the average of absolute percentage error

Statistical Methods

Exponential smoothing ARIMA(AutoRegressive Integrated

Average Model)

Statistical Methods- Exponential smoothing) Smoothing technique---used to reduce noise,

also remove seasonality to make trend component salient

Exponential smoothing---a tool for noise filtering and forecasting

simple exponential smoothing--- the best model for one-period-ahead forecasting among 25 time series methods(Markridakis et la )

Statistical Methods- Exponential smoothing) formula of the exponential smoothing: Xt = b + t

St = Xt + (1 - )St-1

Where t: error

b: constant Xt: observed value at time t

St: smoothed value at time t

Statistical Methods- Exponential smoothing) smoothed value at time t---weighted average of observed

value at time t and smoothed value at time t-1 smoothed value at time t-1---weighted average of observed

value at time t-1 and smoothed value at time t-2 the current smoothed value---weighted average of all

previous values(plug in all the Si )

weight of the previous observed value--- decreases with time t exponentially. The earlier the observed value, the smaller the weight it has

Statistical Methods- Exponential smoothing) = 0---the current observed value is

ignored; Current smoothed vaule= the initial smoothed value (recursively using the formula, get the same smoothed value for all t)

= 1---no smoothing is applied choose ---1) between 0 and 1; 2) make

the sum of squares of residuals the smallest

Statistical Methods- Exponential smoothing S0(initial smoothed value )--- has great

influence on forecasts( is close to zero) --- has little effect (lots of observed values

before the forecasting) Choose S0 --- the model produce best results

Statistical Methods-ARIMA

ARIMA(AutoRegressive Integrated Average Model)

Box and Jenkins (1976) interpret and forecast complex(to get satisfactory results, the researcher

must have a great deal of experience) ARIMA(p,d,q),which means it has p

autoregressive parameters, q moving average parameters after it is differenced d times


Autoregressive process Each value is viewed as decided by the previous data: Xt = + 1Xt-1 + 2Xt-2 + … + Where Xt : observed value at time t

: constant I: parameters

: noise


Moving average process each value is viewed as decided by current and

previous noises.

Xt = + t - 1t-1 - 2t-2 - …

Where : constant I: parameter


Difference---a procedure that transform each I’th value into its difference from the (I-k)’th value. K is the lag, which is the length of seasonality, that is , the time series data repeat itself every K length of time.

---removes some seasonality, so that other seasonality or the trend components is clearer


Empirically, one of the five following models yields good results

1)one autoregressive parameter 2)two autoregressive parameters 3)one moving average parameter 4)two moving average parameters 5)one autoregressive parameter and one

moving average parameter

Fuzzy Logic

An alternative to traditional notions of set membership and logic

1. Truth values or membership values are indicated by a value on the range[0.0, 1.0] with 0.0 representing representing absolute falseness and 1.0 absolute truth.

2. Hedge: modifier of fuzzy values:’very’,’more or less’,’somewhat’,’rather’

Fuzzy Logic

traditional logic(Boolean Logic) or membership of a set---has only two values: 0 or 1 (or true or false) {“ Law of the excluded Middle”---things can not be both true and false simultaneously}

Example---Jack’s height is 185cm, can we say that Jack is tall? Consider the mean height of people in the U.S., Jack is tall; but if Jack is a basketball player, he is deemed to be short

Fuzzy Logic

1965, Lotif A. Zadeh published “Fuzzy Sets(the “Father of Fuzzy Logic”)

Zadeh defined fuzzy logic as “the logic of approximate reasoning with (traditional) precise reasoning as the limiting case.” --crisp sets(traditional sets are included in fuzzy sets; they are just extremes)

Fuzzy Logic

membership function---map set to [0 , 1], 0 represents absolute falseness; 1 represents absolute truth

example--- the set old. If Jack is 100 years old, he might be assigned 1; if he is 41 years old, he might be assigned 0; Jack might be assigned 0.8 if he is 75 years old. Those values are called truth values or membership values. And the membership function for the last situation might be written as mTALL(Jack)=0.8.

Fuzzy Logic

difference between fuzzy logic and probability---in fuzzy logic, 0.8 means the degree that Jack belongs to the set TALL; probability 0.8 means that there is a 80% chance that Jack belongs to the set TALL

Fuzzy Logic

hedges or modifier --- describe “very”, “somewhat”, “rather”, “more or less”.

the definitions of hedges are subjective.(as the membership values)

Example--- “very” is usually defined as m”VERY”A(x) = mA(X)^2. Thus, mVERYTALL(Jack) = 0.8*0.8 = 0.64.

Fuzzy Logic

uncertain (not random) factors---imperfectness in data collection, wars, fluctuations of consumer prices, stock prices and sales revenues...

Fuzzy Logic

DataX software suite(Zaptron Systems) the average prediction accuracy is 97% X(t) = Fl(t)+Fc(t)+fu(t)+e(t) Where Fl(t) :trend componet Fc(t): seasonality Fu(t) : fuzziness e(t) : error

Fuzzy Logic

Example:predict the purchase trend and customer’s satisfaction level

Fuzzy Logic

Consumption utility function---a mathematical description of the satisfaction level of a consumer for consumption, takes value from [0 , 1]

consumption behaviors---determined by both physical properties of the goods and personal factors of the consumers. Physical properties include quality, appearance, material, etc. of the goods. Personal factors includes a person’s preference, psychological factors, etc. For example, the consumption utility of a hamburger to a hungry person may be 1, while that to a person with a full stomach may be 0.

A fuzzy concept

Fuzzy Logic

U = w1*U1+w2*U2+…+wN*UN Ui is the utility of ith goods For a certain situation, Fi is the map of spending on ith goods to [0 , 1] Xi: the spending on ith goods Ui = 1 if Xi = ; Ui = 0 if Xi = 0 Ui must be increasing with respect to Xi, and the marginal increasing should be decreasing. Fi is determined by the parameters Di, Pi, Si, C Di a parameter determined by the consumer’s subjective evaluation of the consumption Pi price of the ith goods Qi = Di * Pi , consumer’s subjective measure on the value of a cosumption

Si percentage of spending on ith goods Si = Ai+Bi/C Ai limit spending percentage of the ith goods in total spending; limit spending occurs when one’s spending

is big enough; i.e. he or she can spend as much as he or she like without the constraint of resource. Bi the trend of change in Ai caused by change in personal income

Ci total amount spent on N goods

Fuzzy logic

Data X software suite has special modules to compute { Di , Qi , Si , Ai , Bi , Wi } based on maximum utility principles.

Raw data---annual average consumption for 5 categories (food, clothing, energy, housing, supplies, and entertainment) from 1980 to 1984, by consumers in a remote rural area

Assumption--- annual spending rate increases by 8% and population increases by 0.9%

Forecast--- purchase trend and consumer’s satisfaction level for the years 1985, 1990, 1995 and 2000

Chaos Theory

Chaos--- describes unpredictable behaviors cause---highly sensitivity to initial conditions(high divergence of the

system though two starting points are arbitrarily close starting points) --- Each disturbance in a system may be small, but the change it

creates may increases immensely with time(“butterfly effect”-- a flapping of the wings of a butterfly in Beijing, China will eventually cause minor weather change in New York, America)

Impossible--- to predict the exact state of a system Possible--- model the whole system (Chaos theory analyzes the order

inherent in the system) Edward Lorenz--- long-term predict of whether is

impossible( “butterfly effect” )

Chaos Theory

Attractor--- a set of states that the system will eventually “settle down to “ (a single point, a infinite number of points,can have as many parameters as the variables of the system)

Basin of an attractor--- a region of starting points that go to this attractor

Repellor--- a point the system moves away from Saddle point--- a point that is an attractor from

some regions and a repellor to other regions

Chaos Theory

Return map --- a plot of the data at time t against t+delta(t)

Chaos Theory---Logistic Map in the chaotic region, r=3.99

Chaos Theory---Return Map from Logistic Map, r=3.99

Artificial Neural Networks

First appear-early 1950’s John Hopfield---1982, flourish of neural network ---a certain kind of highly interconnected network

could be considered as a dynamic system containing “energy”. The process that the network started from some random states and went to some stable states was just like that of an unstable system eventually went to a state of minimal energyvery efficient and thus led to the

Artificial Neural Networks

Artificial neural network system --- an assembly of connected simple processing elements

cells or neurons---these elements internal links ---interconnections The structure of a neural network system is a

simplified structure of a brain of an animal, hence the name neural network

Artificial Neural Network

A neural network--- a mathematical function that computes an output based on a set of input values

--- each neuron involves a mathematical function that uses weights to convert inputs to an output value

--- each neuron can has only one output at a certain time, though that output may broadcast to several other neurons at the same time


y = b+w1x1+w2x2+…+wnxn

x1, x2, …, xn --- the signals

w1, w2, … , wn --- corresponding weights

b --- bias (change the output independently of the inputs)

activation function --- convert the y into an output: z = f(y)

---Usually, logistic sigmoid function is used: f(y) = 1/(1+exp(-y))


training of the network --- the weights are adapted by the input data set (error back-propagation learning method )

--- small changes of an input signal will not change the output of a neuron dramatically

--- changes of a weight will only affect output of a certain number of input patterns


End of training --- the error is below a certain level

--- perform training for a certain number of epochs (an epoch is a full pass through the training set)

Overfit ---occurs when a neural network is trained too much on a certain data set, the neural network has low generalization, neural network follows the training data too much, and can not adapt to new input data patterns correctly


Three components: (1) architecture: determine how the neurons are

connected; (2) learning algorithm: the method of determining

the weights and the bias for each neuron; (3) activation function: mathematical functions

that determine how the input is transformed to output.


Only neurons in adjacent layers may communicate between each other, not neurons in the same layers

The network is connected to the outer by input or first layer and the output or last layer

Layers that are not input and output layers are referred to as hidden layers, because they are not visible by the outside.


feedforward neural network --- data do not go back to former neurons (data go only one-way)

recurrent neural network --- data will be processed by former neurons again (data traveled in both direction)

--- the states of a recurrent neural network will change until a equilibrium is reached


fixed weights --- no learning involves; supervised learning --- each input vector is

associated with a target output vector; unsupervised learning --- no target output

vector is specified


For a time-series data set, in order to use neural network technique for forecasting, the data set is divided into two subsets: one is considered as old data, and are used to train the neural network; the other is used for further forecasting


Attrasoft --- designed software based on Hopfield Neural Model and the Boltzmann Machine for stock prediction and lottery forecast

Chase Manhattan Bank --- implemented Creditview--a neural network to reduce risk on loans to public and private corporations. This system forecasts the loan risk of a company for three years

Alela Corp --- developed a method to predict stock market index based on neural network