Post on 31-Mar-2015
June 2003 Neural Computation for Time Series 1
Neural Computationand Applications in
Time Series and Signal Processing
Georg DorffnerDept. of Medical Cybernetics and
Artificial Intelligence, University of Vienna
And
Austrian Research Institute for Artificial Intelligence
June 2003 Neural Computation for Time Series 2
Neural Computation
• Originally biologically motivated(information processing in the brain)
• Simple mathematical model of the neuron neural network
• Large number of simple „units“• Massively parallel (in theory)• Complexity through the interplay of many simple
elements• Strong relationship to methods from statistics• Suitable for pattern recognition
June 2003 Neural Computation for Time Series 3
A Unit
• Propagation rule:– Weighted sum
– Euclidian distance
• Transfer function f:– Threshold fct.
(McCulloch & Pitts)
– Linear fct.
– Sigmoid fct.
– Gaussian fct.
yj f xj
w1
w2
wi
…
Weight
Unit (Neuron)
Activation, Output
(Net-) Input
June 2003 Neural Computation for Time Series 4
Multilayer Perceptron (MLP), Radial Basis Function Network (RBFN)
• 2 (or more) layers (= connections)
2
1 1
2inout
x
k
l
n
iiilljj
exf
xwfvx
Input Units
Hidden Units (typically nonlinear)
Output Units(typically linear)
x
k
l
n
iiilljj
exf
xwfvx
1
11 1
inoutMLP: RBFN:
June 2003 Neural Computation for Time Series 5
MLP as Universal Function Approximator
• E.g,: 1 Input, 1 Output, 5 Hidden
• MLP can approximate arbitrary functions (Hornik et al. 1990)
• trough superposition of weighted sigmoids
• Similar is true for RBFN
out0
1 1
hid0
inhidoutinoutj
n
j
m
iiiijjkkk wwxwfwxgx
move(bias)
Stretch, mirror
June 2003 Neural Computation for Time Series 6
Training (Model Estimation)
• Typical error function:
• „Backpropagation“ (application of chain rule):
ii w
x
x
E
w
E
out
out
contribution of error function contribution of network
• Iterative optimisation based on gradient(gradient descent, conjugent gradient, quasi-Newton):
outout'outjjjj xtyf out
1
outhid'hidk
n
kjkjj wyf
n
i
m
k
ik
ik txE
1 1
2)()(out, (summed squared error)
targetall patterns
all outputs
June 2003 Neural Computation for Time Series 7
Recurrent Perceptrons
• Recurrent connection = feedback loop
• From hidden layer („Elman“) or output layer („Jordan“) Learning:
„backpropagation through time“
Input Zustands- bzw.Kontextlayer
copy
June 2003 Neural Computation for Time Series 8
Time series processing
• Given: time-dependent observables
• Scalar: univariate; vector: multivariate
• Typical tasks:
- Forecasting- Noise modeling
- Pattern recognition- Modeling
- Filtering- Source separation
Time series(minutes to days)
Signals(milliseconds to seconds)
,1,0, txt
June 2003 Neural Computation for Time Series 9
ExamplesStandard & Poor‘s Sunspots
Preprocessed: (returns) Preprocessed: (de-seasoned)1 ttt xxr 11 ttt xxs
June 2003 Neural Computation for Time Series 10
Autoregressive models
• Forecasting: making use of past information to predict (estimate) the future
• AR: Past information = past observations
tptttt xxxFx ,,, 21
past observations ptX ,
Expected value tx̂
Noise,„random shock“
• Best forecast: expected value
June 2003 Neural Computation for Time Series 11
Linear AR models
• Most common case:
• Simplest form: random walk
• Nontrivial forecast impossible
p
ittit xax
11
1,0~ ;1 Nxx tttt
June 2003 Neural Computation for Time Series 12
MLP as NAR
• Neural network can approximate nonlinear AR model
• „time window“ or „time delay“
June 2003 Neural Computation for Time Series 13
Noise modeling
• Regression is density estimation of:(Bishop 1995)
• Likelihood:
xxttx, ppp |
in
i
ii ppL xxt
1
|
Distribution with expected value F(xi)
Target = future past
June 2003 Neural Computation for Time Series 14
Gaussian noise
• Likelihood:
• Maximization = minimization of -logL(constant terms can be deleted, incl. p(x))
• Corresponds to summed squared error(typical backpropagation)
n
t
tptn
tptt
xXFXxpL
12
2,
1, 2
;exp
2
1|
W
W
n
itpt xXFE
1
2, ;W
June 2003 Neural Computation for Time Series 15
Complex noise models
• Assumption: arbitrary distribution
• Parameters are time dependent (dependent on past):
• Likelihood:
D~
ptXg ,
N
i
iptXgdL
1
)(,
Probability density function for D
June 2003 Neural Computation for Time Series 16
Heteroskedastic time series
• Assumption: Noise is Gaussian with time-dependent variance
• ARCH model
• MLP is nonlinear ARCH (when applied to returns/residuals)
N
i
X
Xx
iptt
iptt
ipt
it
eX
L1
2
)(,
2
)(,
2
2)(,
)(
2
1
p
iitit ra
1
22
222
2121
2 ,,,',,, ptttptttt rrrFrrrF
June 2003 Neural Computation for Time Series 17
Non-Gaussian noise
• Other parametric pdfs (e.g. t-distribution)
• Mixture of Gaussians (Mixture density network, Bishop 1994)
• Network with 3k outputs (or separate networks)
k
i
X
Xx
pti
pti pti
pti
eX
Xd
1
2
,2
,2 ,2
2,
2,,
June 2003 Neural Computation for Time Series 18
Identifiability problem• Mixture models (like neural networks) are not identifiable
(parameters cannot be interpreted)• No distinction between model and noise
e.g. sunspot data:
Models have to be treated with care
June 2003 Neural Computation for Time Series 19
Recurrent networks: Moving Average
• Second model class: Moving Average models• Past information: random shocks
• Recurrent (Jordan) network: Nonlinear MA
• However, convergence notguaranteed
q
iitit bx
0
ttt xx ˆ
June 2003 Neural Computation for Time Series 20
GARCH
• Extension of ARCH:
• Explains „volatility clustering“
• Neural network can again be a nonlinear version
• Using past estimates: recurrent network
p
i
p
iitiitit bra
1 1
222
June 2003 Neural Computation for Time Series 21
State space models
• Observables depend on (hidden) time-variant state
• Strong relationship to recurrent (Elman) networks
• Nonlinear version only with additional hidden layers
ttt
ttt
ss
sx
BA
C
1
June 2003 Neural Computation for Time Series 22
Symbolic time series
• Examples:– DNA
– Text
– Quantised time series (e.g. „up“ and „down“)
• Past information: past p symbols probability distribution
• Markov chains
• Problem: long substrings are rare
it sx
ptttt xxxxp ,,,| 21
alphabet
June 2003 Neural Computation for Time Series 23
Fractal prediction machines
• Similar subsequences are mapped to points close in space
• Clustering = extraction of stochastic automaton
June 2003 Neural Computation for Time Series 24
Relationship to recurrent network
• Network of 2nd order
June 2003 Neural Computation for Time Series 25
Other topics
• Filtering:corresponds to ARMA modelsNN as nonlinear filters
• Source separationindependent component analysis
• Relationship to stochastic automata
June 2003 Neural Computation for Time Series 26
Practical considerations
• Stationarity is an important issue
• Preprocessing (trends, seasonalities)
• N-fold cross-validation time-wise(validation set must be after training set
• Mean and standard deviation model selection
train
validation
test
June 2003 Neural Computation for Time Series 27
Summary
• Neural networks are powerful semi-parametric models for nonlinear dependencies
• Can be considered as nonlinear extensions of classical time series and signal processing techniques
• Applying semi-parametric models to noise modeling adds another interesting facet
• Models must be treated with care, much data necessary