Neural Networks Slides from: Doug Gray, David Poole.

25
Neural Networks Neural Networks Slides from: Doug Gray, David Poole Slides from: Doug Gray, David Poole

Transcript of Neural Networks Slides from: Doug Gray, David Poole.

Page 1: Neural Networks Slides from: Doug Gray, David Poole.

Neural NetworksNeural Networks

Slides from: Doug Gray, David PooleSlides from: Doug Gray, David Poole

Page 2: Neural Networks Slides from: Doug Gray, David Poole.

What is a Neural Network?What is a Neural Network?

• Information processing paradigm that is Information processing paradigm that is inspired by the way biological nervous inspired by the way biological nervous systems, such as the brain, process systems, such as the brain, process informationinformation

• A method of computing, based on the A method of computing, based on the interaction of multiple connected interaction of multiple connected processing elementsprocessing elements

Page 3: Neural Networks Slides from: Doug Gray, David Poole.

What can a Neural Net do?What can a Neural Net do?

Compute a known functionCompute a known function

Approximate an unknown functionApproximate an unknown function

Pattern RecognitionPattern Recognition

Signal ProcessingSignal Processing

Learn to do any of the aboveLearn to do any of the above

Page 4: Neural Networks Slides from: Doug Gray, David Poole.

Basic ConceptsBasic Concepts

Neural Network

Input 0 Input 1 Input n...

Output 0 Output 1 Output m...

A Neural Network generally A Neural Network generally maps a set of inputs to a set maps a set of inputs to a set of outputsof outputs

Number of inputs/outputs is Number of inputs/outputs is variablevariable

The Network itself is The Network itself is composed of an arbitrary composed of an arbitrary number of nodes with an number of nodes with an arbitrary topologyarbitrary topology

Page 5: Neural Networks Slides from: Doug Gray, David Poole.

Basic ConceptsBasic Concepts

Definition of a node:Definition of a node:

• A node is an element A node is an element which performs the which performs the functionfunction

y = fy = fHH(∑(w(∑(wiixxii) + W) + Wbb))fH(x)

Input 0 Input 1 Input n...

W0 W1 Wn

+

Output

+

...

Wb

NodeNode

ConnectionConnection

Page 6: Neural Networks Slides from: Doug Gray, David Poole.

PropertiesProperties

Inputs are flexible Inputs are flexible any real valuesany real values Highly correlated or independentHighly correlated or independent

Target function may be discrete-valued, real-Target function may be discrete-valued, real-valued, or vectors of discrete or real valuesvalued, or vectors of discrete or real values Outputs are real numbers between 0 and 1Outputs are real numbers between 0 and 1

Resistant to errors in the training dataResistant to errors in the training dataLong training timeLong training timeFast evaluationFast evaluationThe function produced can be difficult for The function produced can be difficult for humans to interprethumans to interpret

Page 7: Neural Networks Slides from: Doug Gray, David Poole.

PerceptronsPerceptrons

Basic unit in a neural networkBasic unit in a neural networkLinear separatorLinear separatorPartsParts N inputs, x1 ... xnN inputs, x1 ... xn Weights for each input, w1 ... wnWeights for each input, w1 ... wn A bias input x0 (constant) and associated A bias input x0 (constant) and associated

weight w0weight w0 Weighted sum of inputs, y = w0x0 + w1x1 + ... Weighted sum of inputs, y = w0x0 + w1x1 + ...

+ wnxn+ wnxn A threshold function (activation function), i.e 1 if A threshold function (activation function), i.e 1 if

y > 0, -1 if y <= 0y > 0, -1 if y <= 0

Page 8: Neural Networks Slides from: Doug Gray, David Poole.

DiagramDiagram

x1

x2

.

.

.

xn

Σ Threshold

y = Σ wixi

x0

w0

w1

w2

wn

1 if y >0-1 otherwise

Page 9: Neural Networks Slides from: Doug Gray, David Poole.

Typical Activation FunctionsTypical Activation Functions

F(x) = 1 / (1 + e F(x) = 1 / (1 + e –x–x))

Using a nonlinear Using a nonlinear function which function which approximates a linear approximates a linear threshold allows a threshold allows a network to approximate network to approximate nonlinear functionsnonlinear functions

Page 10: Neural Networks Slides from: Doug Gray, David Poole.

Simple PerceptronSimple Perceptron

Binary logic applicationBinary logic application

ffHH(x) = u(x) [linear threshold](x) = u(x) [linear threshold]

WWii = random(-1,1) = random(-1,1)

Y = u(WY = u(W00XX00 + W + W11XX11 + W + Wbb))

Now how do we train it?Now how do we train it?

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Page 11: Neural Networks Slides from: Doug Gray, David Poole.

Basic TrainingBasic Training

Perception learning rulePerception learning ruleΔWΔWii = η * (D – Y) * X = η * (D – Y) * Xii

η = Learning Rateη = Learning Rate

D = Desired OutputD = Desired Output

Adjust weights based on how well the Adjust weights based on how well the current weights match an objectivecurrent weights match an objective

Page 12: Neural Networks Slides from: Doug Gray, David Poole.

Logic TrainingLogic Training

Expose the network to the logical Expose the network to the logical OR operationOR operation

Update the weights after each Update the weights after each epochepoch

As the output approaches the As the output approaches the desired output for all cases, ΔWdesired output for all cases, ΔW i i will will

approach 0approach 0

XX00 XX11 DD

00 00 00

00 11 11

11 00 11

11 11 11

Page 13: Neural Networks Slides from: Doug Gray, David Poole.

ResultsResultsWW00 W W11 W Wbb

Page 14: Neural Networks Slides from: Doug Gray, David Poole.

DetailsDetails

Network converges on a hyper-plane decision Network converges on a hyper-plane decision surfacesurface

XX11 = (W = (W00/W/W11)X)X00 + (W + (Wbb/W/W11))

XX11

XX00

Page 15: Neural Networks Slides from: Doug Gray, David Poole.

Feed-forward neural networksFeed-forward neural networks

Feed-forward neural networks are the Feed-forward neural networks are the most common most common models.models.

These are directed acyclic graphs:These are directed acyclic graphs:

Page 16: Neural Networks Slides from: Doug Gray, David Poole.

Neural Network for the news Neural Network for the news exampleexample

Page 17: Neural Networks Slides from: Doug Gray, David Poole.

Axiomatizing the NetworkAxiomatizing the Network

The values of the attributes are real numbers.The values of the attributes are real numbers.Thirteen parameters Thirteen parameters w0; … ;w12 are real numbers.w0; … ;w12 are real numbers.The attributes The attributes h1 and h2 correspond to the values ofh1 and h2 correspond to the values ofhidden units.hidden units.There are 13 real numbers to be learned. The There are 13 real numbers to be learned. The hypothesis space is thus a 13-dimensional real space.hypothesis space is thus a 13-dimensional real space.Each point in this 13-dimensional space corresponds Each point in this 13-dimensional space corresponds to a particular logic program that predicts a value for to a particular logic program that predicts a value for reads reads given given known, new, short, and homeknown, new, short, and home

Page 18: Neural Networks Slides from: Doug Gray, David Poole.
Page 19: Neural Networks Slides from: Doug Gray, David Poole.

Prediction ErrorPrediction Error

Page 20: Neural Networks Slides from: Doug Gray, David Poole.

Neural Network LearningNeural Network Learning

Aim of neural network learning: given a set Aim of neural network learning: given a set of examples, find parameter settings that of examples, find parameter settings that minimize the error.minimize the error.

Back-propagation learning is gradient Back-propagation learning is gradient descent search through the parameter descent search through the parameter space to minimize the space to minimize the sum-of-squares sum-of-squares error.error.

Page 21: Neural Networks Slides from: Doug Gray, David Poole.

Backpropagation LearningBackpropagation Learning

Inputs:Inputs: A network, including all units and their A network, including all units and their

connectionsconnections Stopping CriteriaStopping Criteria Learning Rate (constant of proportionality of Learning Rate (constant of proportionality of

gradient gradient descent search)descent search) Initial values for the parametersInitial values for the parameters A set of classified training dataA set of classified training data

Output: Updated values for the parametersOutput: Updated values for the parameters

Page 22: Neural Networks Slides from: Doug Gray, David Poole.

Backpropagation Learning Backpropagation Learning AlgorithmAlgorithm

RepeatRepeat evaluate the network on each example given evaluate the network on each example given

the the current parameter settingscurrent parameter settings determine the derivative of the error for each determine the derivative of the error for each

parameterparameter change each parameter in proportion to its change each parameter in proportion to its

derivativederivative

until the stopping criteria is metuntil the stopping criteria is met

Page 23: Neural Networks Slides from: Doug Gray, David Poole.

Gradient Descent for Neural Net Gradient Descent for Neural Net LearningLearning

Page 24: Neural Networks Slides from: Doug Gray, David Poole.

Bias in neural networks and Bias in neural networks and decision treesdecision trees

It’s easy for a neural network to represent “at least It’s easy for a neural network to represent “at least two of two of II11, …, I, …, Ikk are true”: are true”:

ww0 0 ww11 w wkk

-15 10 10-15 10 10This concept forms a large decision tree.This concept forms a large decision tree.

Consider representing a conditional: “If Consider representing a conditional: “If c then a c then a else b”:else b”: Simple in a decision tree.Simple in a decision tree. Needs a complicated neural network to representNeeds a complicated neural network to represent

(c ^ a) V (~c ^ b).(c ^ a) V (~c ^ b).

Page 25: Neural Networks Slides from: Doug Gray, David Poole.

Neural Networks and LogicNeural Networks and Logic

Meaning is attached to the input and Meaning is attached to the input and output units.output units.

There is no a priori meaning associated There is no a priori meaning associated with the hidden with the hidden units.units.

What the hidden units actually represent is What the hidden units actually represent is something something that’s learned.that’s learned.