WK4 – Radial Basis Function Networks

40
Contents Time Series Prediction TS & NNs RBF Model CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusion s WK4 – Radial Basis Function Networks CS 476: Networks of Neural Computation WK4 – Radial Basis Function Networks Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009

description

WK4 – Radial Basis Function Networks. CS 476: Networks of Neural Computation WK4 – Radial Basis Function Networks Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009. Contents. Introduction to Time Series Analysis Prediction Problem - PowerPoint PPT Presentation

Transcript of WK4 – Radial Basis Function Networks

Page 1: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

WK4 – Radial Basis Function Networks

CS 476: Networks of Neural Computation

WK4 – Radial Basis Function Networks

Dr. Stathis KasderidisDept. of Computer Science

University of Crete

Spring Semester, 2009

Page 2: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Contents

Contents

•Introduction to Time Series Analysis•Prediction Problem•Predicting Time Series with Neural Networks•Radial Basis Function Network•Conclusions

Page 3: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis

•There are two major classes of statistical problems:

•Classification problems (given an input x find in which of a set of K known classes it belongs to);•Regression problems (try to build a functional relationship between independent and regressed variables. The former are the effects, while the latter are the the causes).

•The regression problems are created due to the need for:

•Explanation•Prediction•Control

Page 4: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis II

•In a regression problem, there are two high-level issues to determine:

•The nature of the mechanism that generates the data (stochastic or deterministic). This affects which class of models he will use use;•A modelling procedure.

Page 5: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis III

• A modelling procedure includes usually the following steps:1. Specification of a model :

• If it describes a function or a probability distribution;

• If it is linear or non-linear;• If it is parametric or non-parametric;• If it is a mixture or a single function;• It it includes time explicitly or not;• It it include memory or not.

Page 6: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis IV

2. Preparation of the data:• Noise reduction;• Scaling;• Appropriate representation for the

target problem;• Transformations• De-correlation (cleaning up spatial or

temporal correlation structure)• Feature extraction• Handling missing values

Page 7: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis V

3. An estimation procedure (i.e. a framework to estimate the model parameters):• Maximum Likelihood estimation;• Bayesian estimation;• (Ordinary) Least Squares;

• Numerical Techniques used in the estimation framework are:

• Optimisation;• Integration;• Graph-Theoretic methods;• etc

Page 8: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis VI

•Availability of data:•Enough in number;•Quality;•Resolution.

•Resulting estimators created by the framework must be:

•Un-biased (i.e. do not systematically differ from the true model in a statistical sense);•Consistent (i.e. as the number of data grows the estimator approaches the true model with probability 1).

Page 9: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis VII

4. A model selection procedure (i.e. to select the best model). Factors include:• Goodness of Fit (i.e. how well fitted first

the given data);• Generalisation (i.e. how well

approaches the underlying data generation mechanism);

• Confidence Intervals.

Page 10: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Time Series

Introduction to Time Series Analysis VIII

5. Testing a model:• Testing the model in out of sample

data;• Re-iterate the modelling procedure until

we produce a model with which we are satisfied;

• Compare different classes of models in order to find the best one;

• Usually we select the simplest class which describes well the data;

• There is not always available a comparison framework among different classes of models.

• Neural Networks are semi-parametric, non-linear statistical modelling techniques

Page 11: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem

•Def: A time series, {Xt}, is a family of real-valued random variables indexed by t. The index t can take values in or .

•When a family of variables is defined in all points in time it is called continuous, otherwise it is called discrete. •In practice we have always a discrete series due to discrete sampling times of a continuous series or due to digitization.•The length of a series is the time elapsed between the recoded start and finish of the series.

Page 12: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem II

•Def: A time series, {Xt}, is called (strictly) stationary if, for any t1, t2,…, tn I, any k I and n=1,2,…

Where P denotes the joint distribution function of the set of random variables which appear as suffices and I is an appropriate indexing set.

•Broadly speaking a time series is stationary if there is no systematic change in mean, if there is no systematic change in variance, and if strictly periodic variations have been removed.

nkntktktnnttt tttxxxtttxxx xxxPxxxP ,...,,,...,, 21,...,,21,...,, 2121

Page 13: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem III

•In classical time series analysis we decompose a time series to the following components:

•A trend (a long term movement);•Fluctuations about the trend of grater or less regularity;•A seasonal component;•A residual (irregular or random effect).

•Typically probability theory of time series examines stationary series and investigates residuals for further structure. However, in other cases we may be interested in capturing the trend (i.e. function approximation).

Page 14: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem IV

•It is assumed that if the residuals do not contain any further structure, then they behave like an IID (identical and independent distributed) process which usually is assumed to be the normal. Such a stochastic process cannot be modelled further, thus the analysis of a time series terminates;•If on the other hand the series contains more structure, we re-iterate the analysis until the residuals do not contain any structure.•Tests to use for checking the normality of the residuals are:

•Kolmogorov-Smirnov test;•BDS test, etc;

Page 15: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem V

•If the structure of the series is linear then we fit a linear model such as ARMA, or if it is non-stationary we fit the ARIMA model.•On the other hand for non-linear models we use the ARCH, GARCH and neural network models. Typically we fit first the linear component with a linear model and then the residuals with a non-linear model.

Page 16: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem VI

•Usually a time series does not have all the desirable statistical properties so we transform it in order to achieve better results before we start the analysis. Typical transforms include:

•Stabilise the variance;•Make seasonal effects additive;•Make the data normally distributed;•Filtering (FFT, moving averages, exponential smoothing, low and high-pass filters, etc)•Differencing (the preferred method for de-trending. We apply differencing until the time series becomes stationary).

Page 17: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem VII

•Restating the prediction problem:

“We want to construct a model with an appropriate technique, which when is estimated can give 'good' forecasts in new data. The new data commonly are some future values of the series. We want the modelto predict as accurately as possible the future values of the time series, given as input some previous values of the series”.

Page 18: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

Prediction

The Prediction Problem VIII

•There are three main approaches which are used to model the series prediction problem:•A. Assume a functional relationship as a generating mechanism. E.g. Xt+1 = F(Xt), where Xt is an appropriate vector of past values and F is the generating mechanism;•B. Assume that the map F has multiple braches. Then the returned output represents the probability of obtaining Xt+1 in any one of the branches of F.•C. Divide the input to a set of classes and try to learn the map from input to classes, I.e. a classification problem.

Page 19: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks

•To apply a neural network model in time series prediction we we have to make choices on the following issues:

•Preparing the data:•Transforming the data (see above);•Handling missing values;•Smoothing the data (if needed);•Scale the data (almost always a good idea!);•Dimensionality reduction (principal component analysis, factor analysis);•De-correlating data•Extracting Features (I.e. combination of variables)

Page 20: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks II

•Representing variables:•Continuous or discrete;•Semantics of variables (i.e. probabilities, categories, data points, etc);•Distributed or atomic representation;•Variables with little information content can be harmful in generalisation;•In Bayesian estimation the method of Automatic Relevance Determination can be used for selecting variables;•Selecting Features•Capturing of causal relations

Page 21: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks III

•Discovering ‘memory’ in the generating process:

•Trial and error;•Partial + Auto-correlation functions (linear);•Mutual Information function (non-linear);•Methods from Dynamical Systems theory;•Determination of past values by fitting a model (e.g. linear) and eliminating past values with small contribution based on sensitivity.

Page 22: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks IV

•Selecting an architecture:•Type of training;•Family of models;•Transfer function;•Memory;•Network Topology;•Other parameters in network specification.

•Model selection:•See discussion in WK3

Page 23: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks V

•Determination of Confidence Intervals:•Jacknife Method (a linear approximation of Bootstrap)•Bootstrap;•Moving Blocks Bootstrap;•Bootstrap t-interval;•Bootstrap percentile interval;•Bias-corrected and accelerated Bootstrap.

Page 24: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

TS & NNs

Time Series Prediction using Neural Networks VI

• Additional Literature:1. Masters T. (1995). Neural, Novel & Hybrid

Algorithms for Time Series Prediction, Wiley.2. Pawitan Y, (2001). In all Likelihood: Statistical

Modelling and Inference Using Likelihood, Oxford University Press.

3. Chatfield C. (1989). The analysis of time series. An introduction. 4th Ed. Chapman & Hall.

4. Harvey A (1993). Time Series Models, Harvester Wheatsheaf.

5. Efron B., Tibshirani R. (1993). An introduction to Bootstrap, Chapman and Hall.

Page 25: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model

•There are only three layers: Input, Hidden and Output. There is only one hidden layer.

Page 26: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model II

•The hidden layer provides a non-linear transformation of the input space to the hidden space, which is assumed usually of high enough dimension.•The output layer combines in a linear way the activations of the hidden layer.

•Note: The RBF model owns its development on ideas of fitting hyper-surfaces to data points in a high-dimensional space. •In Numerical Analysis, radial-basis functions were introduced for the solution of real multivariate interpolation problems.

Page 27: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model III

•In the RBF model the hidden units provide a set of “functions” that constitute an arbitrary “basis” for the input patterns when they are expanded to the hidden space.•The inspiration for the RBF model is based on Cover’s theorem (1965) on the separability of patterns:“A complex pattern-classification problem cast in a high-dimensional space nonlinearly is more likely to be linearly separable than in a low-dimensional space”.•This leads to consider the multivariable interpolation problem in high-dimensional space:

Page 28: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model IV

Given a set of N different points {xi Rm0| I=1,2,..,N} and a corresponding set of N real numbers {di R1 | I=1,2,…,N}, find a function F:RN R1 that satisfies the interpolation condition:

F(xi)= di , I=1,2,…,N

For strict interpolation the interpolating surface, i.e. F, is constrained to pass through all data points.•The radial-basis function (RBF) technique consists of choosing a function F that has the following form:

N

iii xxwxF

1

||)(||)(

Page 29: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model V

Where {(||x-xi||) | I=1,2,…,N} is a set of N arbitrary functions, known as radial-basis functions, and ||•|| denotes a norm, which is usually the Euclidean. The data points xi Rm0

are taken to be the centers of the radial-basis functions.•Assume that d describes the desired response vector and w is the linear weight vector. N is the size of the training set. Let denote an N x N matrix with elements:

ij = (||xj-xi||) , (j,i)=1,2,..,N is called the interpolation matrix.

Page 30: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model VI

•Thus according to the above theorem we can write:

w = xThe solution for the weight vector is:

W = -1xAssuming that is non-singular. The Micchelli’s Theorem provides assurances for a set of functions that create non-singular matrix :Let {xi}i=1

N be a set of distinct points in Rm0 . Then the N x N interpolation matrix is nonsingular.

Page 31: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model VII

•Functions that are covered by Micchalli’s theorem include:

•Multiquadrics:

(r)=(r2 + c2)½ c>0, r R•Inverse Multiquadrics:

(r)=1/(r2 + c2)½ c>0, r R•Gaussian functions:

(r)=exp(-r2/22) >0, r R•All that is required for nonsigular is that the points x be different.

Page 32: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model VIII

•Universal Approximation Theorem for RBF Networks: For any continuous input-output mapping function f(x) there is an RBF network with a set of centers {ti}i=1

m1 and a common width >0 such that the input-output mapping function F(x) realized by the RBF network is close to f(x) in the Lp norm, p [1,].

The RBF network is consisting of functions F: Rm0 R represented by:

1

1

)()(m

i

ii

txGwxF

Page 33: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model IX

•Results on Sample Complexity, Computational Complexity and Generalisation Performance for RBF Networks:

•The generalisation error converges to zero only if the number of hidden units m1, increases more slowly than the size N of the training sample;•For a given size N of training sample, the optimum number of hidden units, m1* , behaves as: m1* N1/3

•The RBF network exhibits a rate of approximation O (1/ m1) that is similar to that of an MLP with sigmoid activation functions.

Page 34: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model X

• Comparison of MLP and RBF networks:1. An RBF network has a single hidden layer.

An MLP has one or more hidden layers;2. Typically the nodes of an MLP in a hidden

or output layer share the same neuronal model. On the other hand the nodes of an RBF in a hidden layer play a different role than those in the output layer;

3. The hidden layer of an RBF is non-linear. The output layer is linear. Typically in an MLP both layers are nonlinear;

Page 35: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Radial Basis Function Model XI

4. An RBF network computes as argument of its activation function the Euclidean norm of the input vector and the center of the unit. In MLP networks the activation function computes the inner product of the input vector and the weight vector of the node;

5. MLPs are global approximators; RBFs are local approximators due to the localised decaying Gaussian (or other) function.

Page 36: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Learning Law for Radial Basis Networks

• To develop a learning law for RBF networks we assume that the error function has the following form:

Where N is the size of the training sample used to do the learning, and ej is the error signal defined by:

N

jjeE

1

2

21

M

icijij

jjj

itxGwd

xFde

1

)||(||

)(

Page 37: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Learning Law for Radial Basis Networks II

•We need to find the free parameters wi, ti and -1 so as to minimise E. Ci is a norm weighting matrix, i.e.:

||x||C2 = (Cx)T(Cx)=xCTCx

•We use a weighted norm matrix when the individual elements of x belong to different classes.•To calculate the update equations we use gradient descent on the instantaneous error function E. We get the following update rules for the free parameters:

Page 38: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Learning Law for Radial Basis Networks III

1. Linear weights (output layer):

i=1,2,…,m1

2. Positions of centers (hidden layer):

i=1,2,…,m1

N

jCijj

ii

xxGnenwnE

1

)||(||)()()(

)()()()1( 1 nwnE

nwnwi

ii

)]([)||)((||')()(2)()( 1

1

ntxntxGnenwntnE

ijiCij

N

jji

ii

)()(

)()1( 2 ntnE

ntnti

ii

Page 39: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions

RBF Model

Learning Law for Radial Basis Networks IV

3. Spreads of centers (hidden layer):

• Note that three different learning rates 1, 2, 3 are used in the gradient descent equations.

)()(

)()1(

)]()][([)(

)()||(||')()()()(

1311

11

nnE

nn

ntxntxnQ

nQtxGnenwnnE

iii

Tijijji

N

jjiCijji

ii

Page 40: WK4 – Radial Basis Function Networks

Contents

Time Series

Prediction

TS & NNs

RBF Model

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConclusionsConclusions

Conclusions

•In time series modelling we seek to extract the maximum possible structure we can find in the series. •We terminate the analysis of a series when the residuals do not contain any more structure, i.e. they have an IID structure.•NN can be used as models in time series prediction.•RBF networks are a second paradigm of multi layer perceptrons. •They are inspired by interpolation theory (numerical analysis)•They can be trained with the gradient descent method, the same as the MLP case.