Neural Networks Slides from: Doug Gray, David Poole.

Neural NetworksNeural Networks

Slides from: Doug Gray, David PooleSlides from: Doug Gray, David Poole

What is a Neural Network?What is a Neural Network?

• Information processing paradigm that is Information processing paradigm that is inspired by the way biological nervous inspired by the way biological nervous systems, such as the brain, process systems, such as the brain, process informationinformation

• A method of computing, based on the A method of computing, based on the interaction of multiple connected interaction of multiple connected processing elementsprocessing elements

What can a Neural Net do?What can a Neural Net do?

Compute a known functionCompute a known function

Approximate an unknown functionApproximate an unknown function

Pattern RecognitionPattern Recognition

Signal ProcessingSignal Processing

Learn to do any of the aboveLearn to do any of the above

Basic ConceptsBasic Concepts

Neural Network

Input 0 Input 1 Input n...

Output 0 Output 1 Output m...

A Neural Network generally A Neural Network generally maps a set of inputs to a set maps a set of inputs to a set of outputsof outputs

Number of inputs/outputs is Number of inputs/outputs is variablevariable

The Network itself is The Network itself is composed of an arbitrary composed of an arbitrary number of nodes with an number of nodes with an arbitrary topologyarbitrary topology

Basic ConceptsBasic Concepts

Definition of a node:Definition of a node:

• A node is an element A node is an element which performs the which performs the functionfunction

y = fy = fHH(∑(w(∑(wiixxii) + W) + Wbb))fH(x)

Input 0 Input 1 Input n...

W0 W1 Wn

+

Output

+

...

Wb

NodeNode

ConnectionConnection

PropertiesProperties

Inputs are flexible Inputs are flexible any real valuesany real values Highly correlated or independentHighly correlated or independent

Target function may be discrete-valued, real-Target function may be discrete-valued, real-valued, or vectors of discrete or real valuesvalued, or vectors of discrete or real values Outputs are real numbers between 0 and 1Outputs are real numbers between 0 and 1

Resistant to errors in the training dataResistant to errors in the training dataLong training timeLong training timeFast evaluationFast evaluationThe function produced can be difficult for The function produced can be difficult for humans to interprethumans to interpret

PerceptronsPerceptrons

Basic unit in a neural networkBasic unit in a neural networkLinear separatorLinear separatorPartsParts N inputs, x1 ... xnN inputs, x1 ... xn Weights for each input, w1 ... wnWeights for each input, w1 ... wn A bias input x0 (constant) and associated A bias input x0 (constant) and associated

weight w0weight w0 Weighted sum of inputs, y = w0x0 + w1x1 + ... Weighted sum of inputs, y = w0x0 + w1x1 + ...

+ wnxn+ wnxn A threshold function (activation function), i.e 1 if A threshold function (activation function), i.e 1 if

y > 0, -1 if y <= 0y > 0, -1 if y <= 0

DiagramDiagram

x1

x2

.

.

.

xn

Σ Threshold

y = Σ wixi

x0

w0

w1

w2

wn

1 if y >0-1 otherwise

Typical Activation FunctionsTypical Activation Functions

F(x) = 1 / (1 + e F(x) = 1 / (1 + e –x–x))

Using a nonlinear Using a nonlinear function which function which approximates a linear approximates a linear threshold allows a threshold allows a network to approximate network to approximate nonlinear functionsnonlinear functions

Simple PerceptronSimple Perceptron

Binary logic applicationBinary logic application

ffHH(x) = u(x) [linear threshold](x) = u(x) [linear threshold]

WWii = random(-1,1) = random(-1,1)

Y = u(WY = u(W00XX00 + W + W11XX11 + W + Wbb))

Now how do we train it?Now how do we train it?

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Basic TrainingBasic Training

Perception learning rulePerception learning ruleΔWΔWii = η * (D – Y) * X = η * (D – Y) * Xii

η = Learning Rateη = Learning Rate

D = Desired OutputD = Desired Output

Adjust weights based on how well the Adjust weights based on how well the current weights match an objectivecurrent weights match an objective

Logic TrainingLogic Training

Expose the network to the logical Expose the network to the logical OR operationOR operation

Update the weights after each Update the weights after each epochepoch

As the output approaches the As the output approaches the desired output for all cases, ΔWdesired output for all cases, ΔW i i will will

approach 0approach 0

XX00 XX11 DD

00 00 00

00 11 11

11 00 11

11 11 11

ResultsResultsWW00 W W11 W Wbb

DetailsDetails

Network converges on a hyper-plane decision Network converges on a hyper-plane decision surfacesurface

XX11 = (W = (W00/W/W11)X)X00 + (W + (Wbb/W/W11))

XX11

XX00

Feed-forward neural networksFeed-forward neural networks

Feed-forward neural networks are the Feed-forward neural networks are the most common most common models.models.

These are directed acyclic graphs:These are directed acyclic graphs:

Neural Network for the news Neural Network for the news exampleexample

Axiomatizing the NetworkAxiomatizing the Network

The values of the attributes are real numbers.The values of the attributes are real numbers.Thirteen parameters Thirteen parameters w0; … ;w12 are real numbers.w0; … ;w12 are real numbers.The attributes The attributes h1 and h2 correspond to the values ofh1 and h2 correspond to the values ofhidden units.hidden units.There are 13 real numbers to be learned. The There are 13 real numbers to be learned. The hypothesis space is thus a 13-dimensional real space.hypothesis space is thus a 13-dimensional real space.Each point in this 13-dimensional space corresponds Each point in this 13-dimensional space corresponds to a particular logic program that predicts a value for to a particular logic program that predicts a value for reads reads given given known, new, short, and homeknown, new, short, and home

Prediction ErrorPrediction Error

Neural Network LearningNeural Network Learning

Aim of neural network learning: given a set Aim of neural network learning: given a set of examples, find parameter settings that of examples, find parameter settings that minimize the error.minimize the error.

Back-propagation learning is gradient Back-propagation learning is gradient descent search through the parameter descent search through the parameter space to minimize the space to minimize the sum-of-squares sum-of-squares error.error.

Backpropagation LearningBackpropagation Learning

Inputs:Inputs: A network, including all units and their A network, including all units and their

connectionsconnections Stopping CriteriaStopping Criteria Learning Rate (constant of proportionality of Learning Rate (constant of proportionality of

gradient gradient descent search)descent search) Initial values for the parametersInitial values for the parameters A set of classified training dataA set of classified training data

Output: Updated values for the parametersOutput: Updated values for the parameters

Backpropagation Learning Backpropagation Learning AlgorithmAlgorithm

RepeatRepeat evaluate the network on each example given evaluate the network on each example given

the the current parameter settingscurrent parameter settings determine the derivative of the error for each determine the derivative of the error for each

parameterparameter change each parameter in proportion to its change each parameter in proportion to its

derivativederivative

until the stopping criteria is metuntil the stopping criteria is met

Gradient Descent for Neural Net Gradient Descent for Neural Net LearningLearning

Bias in neural networks and Bias in neural networks and decision treesdecision trees

It’s easy for a neural network to represent “at least It’s easy for a neural network to represent “at least two of two of II11, …, I, …, Ikk are true”: are true”:

ww0 0 ww11 w wkk

-15 10 10-15 10 10This concept forms a large decision tree.This concept forms a large decision tree.

Consider representing a conditional: “If Consider representing a conditional: “If c then a c then a else b”:else b”: Simple in a decision tree.Simple in a decision tree. Needs a complicated neural network to representNeeds a complicated neural network to represent

(c ^ a) V (~c ^ b).(c ^ a) V (~c ^ b).

Neural Networks and LogicNeural Networks and Logic

Meaning is attached to the input and Meaning is attached to the input and output units.output units.

There is no a priori meaning associated There is no a priori meaning associated with the hidden with the hidden units.units.

What the hidden units actually represent is What the hidden units actually represent is something something that’s learned.that’s learned.

Neural Networks Slides from: Doug Gray, David Poole.

Documents

Transcript of Neural Networks Slides from: Doug Gray, David Poole.