Artificial Neural Networks and Deep Learning · 101 9x9 convolution (64 kernels) 9x9 convolution...

1
Artificial Neural Networks and Deep Learning Jack Baker STOR-i CDT, Lancaster University 1. Introduction Artificial Neural Networks (ANNs) were originally inspired by net- works of neurons found in the brains of animals. Although they haven’t quite reached the levels of complexity found in the brain, they have been found to be very useful tools in pattern recognition and machine learning, especially more recently. 2. What are Artificial Neural Networks? Feedforward Neural Network A feedforward neural network consists of layers of units. To each unit j a linear combination of inputs x is passed. These inputs are weighted by a vector w j to give a j = w T j x + w 0 , (1) where w 0 is a constant known as the bias. The linear combination is then transformed by a function h(.) to give the activation z j of the unit z j = h(a j ). The activation of a unit in each layer is then passed as an input to the next layer. The last set of units pass their activations as outputs y k of the network. Note that feedforward in this case simply means that the activations of each unit must be passed to the next layer, not to one behind it. Using the same notation as in (1), below is a picture of a feedforward neural network with one hidden layer. x 1 x 3 x 2 W (1) z 1 z 2 z 3 z 4 W (2) y 2 y 1 Figure 1: Single-hidden layer feedforward neural network [adapted from colored neural network by Glosser.ca, CC BY-SA]. 3. Motivation Recently different neural network based algorithms have won a huge number of pattern recognition competitions. ANNs are able to approximate any continuous function to any desired accuracy. ANNs are highly flexible, and can be used to solve many different problems such as classification, forecasting and regression. ANNs can be built such that they are highly invariant to linear transformation (see Section 5). Despite the widespread use of ANNs, the theoretical properties of some of the methods are not well understood. 4. Parameter Optimization There are a number of methods used to find optimal weights W when training feedforward neural networks. Error Backpropagation The error function of an ANN is rarely minimisable analytically. A natural choice is therefore to approximate the minimum error using a gradient descent algorithm such as stochastic gradient descent. Due to the form of the error function, application of the chain rule allows us to write for each hidden unit j δ j := ∂E ∂a j = h (a j ) X k w kj δ k (2) where k runs over all units k to which j sends a connection and we have ignored the subscript n over each observation. Iteratively applying the backpropagation formula (2) and using other relations allows us to efficiently compute the gradient of the error term, which allows us to apply our choice of gradient descent algorithm. Learning in Deep Networks The training of deep neural networks using Error Backpropaga- tion regularly leads to particularly poor results. This led [1] to study generative neural networks, where each unit is treated as a random variable rather than being deterministic. It was found that a particular generative model, known as a deep belief network could be trained layer by layer in an unsupervised way by training each layer as a restricted boltzmann machine. These parameter estimates could then be tuned further using more traditional methods. 5. Convolutional Neural Networks Recently neural networks have gained a lot of attention for their excellent performance in pattern recognition tasks. Part of this suc- cess is due to a particular design of their structure, developed by [2], which makes the network invariant to various transformations of the input. The development is known as a convolutional neural network, a diagram of an early design is given in Figure 2. input 83x83 Layer 1 64x75x75 Layer 2 64@14x14 Layer 3 256@6x6 Layer 4 256@1x1 Output 101 9x9 convolution (64 kernels) 9x9 convolution (4096 kernels) 10x10 pooling, 5x5 subsampling 6x6 pooling 4x4 subsamp Figure 2: Diagram of an early convolutional neural network, [Hierarchical Models of Perception and Reasoning, Yann LeCun, 2013 (Presentation)]. The structure basically works by feeding the inputs to the next layer into grids of units, known as feature maps, with a small quantity of the input fed to each unit on the grid. However on each grid the units are constrained to have the same weight parameter. Unlike many deep networks, optimal weights can be found using a modified version of error backpropagaion. Figure 3: Real-time scene parsing using neural networks [Farabet et al. 2012] References [1]Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learn- ing algorithm for deep belief nets. Neural computation, 18(7):1527– 1554, 2006. [2]Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceed- ings of the IEEE, 86(11):2278–2324, 1998.

Transcript of Artificial Neural Networks and Deep Learning · 101 9x9 convolution (64 kernels) 9x9 convolution...

Page 1: Artificial Neural Networks and Deep Learning · 101 9x9 convolution (64 kernels) 9x9 convolution (4096 kernels) 10x10 pooling, 5x5 subsampling 6x6 pooling 4x4 subsamp Figure 2:Diagram

Artificial Neural Networks and Deep LearningJack Baker

STOR-i CDT, Lancaster University

1. Introduction

Artificial Neural Networks (ANNs) were originally inspired by net-works of neurons found in the brains of animals. Although theyhaven’t quite reached the levels of complexity found in the brain,they have been found to be very useful tools in pattern recognitionand machine learning, especially more recently.

2. What are Artificial Neural Networks?

Feedforward Neural Network•A feedforward neural network consists of layers of units.•To each unit j a linear combination of inputs x is passed.•These inputs are weighted by a vector wj to give

aj = wTj x + w0, (1)

where w0 is a constant known as the bias.•The linear combination is then transformed by a function h(.)to give the activation zj of the unit zj = h(aj).

•The activation of a unit in each layer is then passed as aninput to the next layer.

•The last set of units pass their activations as outputs yk of thenetwork.

Note that feedforward in this case simply means that the activationsof each unit must be passed to the next layer, not to one behind it.Using the same notation as in (1), below is a picture of a feedforwardneural network with one hidden layer.

x1

x3

x2

W(1)

z1

z2

z3

z4

W(2)y2

y1

Figure 1: Single-hidden layer feedforward neural network [adapted from coloredneural network by Glosser.ca, CC BY-SA].

3. Motivation

•Recently different neural network based algorithms have won ahuge number of pattern recognition competitions.

•ANNs are able to approximate any continuous function to anydesired accuracy.

•ANNs are highly flexible, and can be used to solve many differentproblems such as classification, forecasting and regression.

•ANNs can be built such that they are highly invariant to lineartransformation (see Section 5).

Despite the widespread use of ANNs, the theoretical propertiesof some of the methods are not well understood.

4. Parameter Optimization

There are a number of methods used to find optimal weights Wwhen training feedforward neural networks.

Error BackpropagationThe error function of an ANN is rarely minimisable analytically.A natural choice is therefore to approximate the minimum errorusing a gradient descent algorithm such as stochastic gradientdescent. Due to the form of the error function, application of thechain rule allows us to write for each hidden unit j

δj := ∂E

∂aj= h′(aj)

∑k

wkjδk (2)

where k runs over all units k to which j sends a connection andwe have ignored the subscript n over each observation.Iteratively applying the backpropagation formula (2) and usingother relations allows us to efficiently compute the gradient ofthe error term, which allows us to apply our choice of gradientdescent algorithm.

Learning in Deep NetworksThe training of deep neural networks using Error Backpropaga-tion regularly leads to particularly poor results. This led [1] tostudy generative neural networks, where each unit is treated asa random variable rather than being deterministic.It was found that a particular generative model, known as a deepbelief network could be trained layer by layer in an unsupervisedway by training each layer as a restricted boltzmann machine.These parameter estimates could then be tuned further usingmore traditional methods.

5. Convolutional Neural Networks

Recently neural networks have gained a lot of attention for theirexcellent performance in pattern recognition tasks. Part of this suc-cess is due to a particular design of their structure, developed by[2], which makes the network invariant to various transformationsof the input. The development is known as a convolutional neuralnetwork, a diagram of an early design is given in Figure 2.

input

83x83

Layer 1

64x75x75 Layer 2

64@14x14

Layer 3

256@6x6 Layer 4

256@1x1 Output

101

9x9

convolution

(64 kernels)

9x9

convolution

(4096 kernels)

10x10 pooling,

5x5 subsampling6x6 pooling

4x4 subsamp

Figure 2: Diagram of an early convolutional neural network, [Hierarchical Models ofPerception and Reasoning, Yann LeCun, 2013 (Presentation)].

The structure basically works by feeding the inputs to the next layerinto grids of units, known as feature maps, with a small quantityof the input fed to each unit on the grid. However on each grid theunits are constrained to have the same weight parameter. Unlikemany deep networks, optimal weights can be found using a modifiedversion of error backpropagaion.

Figure 3: Real-time scene parsing using neural networks [Farabet et al. 2012]

References

[1]Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learn-ing algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[2]Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner.Gradient-based learning applied to document recognition. Proceed-ings of the IEEE, 86(11):2278–2324, 1998.