Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction...
Transcript of Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction...
![Page 1: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/1.jpg)
1
Neural Networks: Introduction
MachineLearningFall2017
SupervisedLearning:TheSetup
1
Machine LearningSpring 2020
The slides are partly from Vivek Srikumar
![Page 2: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/2.jpg)
Where are we?
Learning algorithms• Decision Trees• AdaBoost• Bagging• Least Mean Square• Perceptron• Support Vector Machines• Kernel SVM• Kernel Perceptron
2
![Page 3: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/3.jpg)
Where are we?
General learning principles• Overfitting• Mistake-bound learning• PAC learning, sample complexity• Hypothesis choice & VC dimensions• Training and generalization errors• Regularized Empirical Risk
Minimization
Learning algorithms• Decision Trees• AdaBoost• Bagging• Least Mean Square• Perceptron• Support Vector Machines• Kernel SVM• Kernel Perceptron
3
![Page 4: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/4.jpg)
Where are we?
General learning principles• Overfitting• Mistake-bound learning• PAC learning, sample complexity• Hypothesis choice & VC dimensions• Training and generalization errors• Regularized Empirical Risk
Minimization
Learning algorithms• Decision Trees• AdaBoost• Bagging• Least Mean Square• Perceptron• Support Vector Machines• Kernel SVM• Kernel Perceptron
4
Produce linear classifiers
![Page 5: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/5.jpg)
Learning algorithms• Decision Trees• AdaBoost• Bagging• Least Mean Square• Perceptron• Support Vector Machines• Kernel SVM• Kernel Perceptron
Where are we?
General learning principles• Overfitting• Mistake-bound learning• PAC learning, sample complexity• Hypothesis choice & VC dimensions• Training and generalization errors• Regularized Empirical Risk
Minimization
5
So far, we’ve seen how to use kernel tricks to extend linear classiers to nonlinear onesWhat if we want to directly train a non-linear classifier?
Where do the features come from?
Produce linear classifiers
![Page 6: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/6.jpg)
Neural Networks
• What is a neural network?
• Predicting with a neural network
• Training neural networks
• Practical concerns
6
![Page 7: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/7.jpg)
This lecture
• What is a neural network?– The hypothesis class– Structure, expressiveness
• Predicting with a neural network
• Training neural networks
• Practical concerns
7
![Page 8: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/8.jpg)
We have seen linear threshold units
8
features
dot product
threshold
![Page 9: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/9.jpg)
We have seen linear threshold units
9
features
dot product
threshold
Prediction𝑠𝑔𝑛 (𝒘!𝒙 + 𝑏) = 𝑠𝑔𝑛(∑𝑤"𝑥" + 𝑏)
![Page 10: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/10.jpg)
We have seen linear threshold units
10
features
dot product
threshold
Prediction𝑠𝑔𝑛 (𝒘!𝒙 + 𝑏) = 𝑠𝑔𝑛(∑𝑤"𝑥" + 𝑏)
Learningvarious algorithms perceptron, SVM,…
in general, minimize loss
![Page 11: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/11.jpg)
We have seen linear threshold units
11
features
dot product
threshold
Prediction𝑠𝑔𝑛 (𝒘!𝒙 + 𝑏) = 𝑠𝑔𝑛(∑𝑤"𝑥" + 𝑏)
Learningvarious algorithms perceptron, SVM,…
in general, minimize loss
But where do these input features come from?
What if the features were outputs of another classifier?
![Page 12: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/12.jpg)
Features from classifiers
12
![Page 13: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/13.jpg)
Features from classifiers
13
![Page 14: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/14.jpg)
Features from classifiers
14
Each of these connections have their own weights as well
![Page 15: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/15.jpg)
Features from classifiers
15
![Page 16: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/16.jpg)
Features from classifiers
16
This is a two layer feed forward neural network
![Page 17: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/17.jpg)
Features from classifiers
17
The output layer
The hidden layerThe input layer
This is a two layer feed forward neural network
Think of the hidden layer as learning a good representation of the inputs
![Page 18: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/18.jpg)
Features from classifiers
18
The dot product followed by the threshold constitutes a neuron
This is a two layer feed forward neural network
![Page 19: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/19.jpg)
Features from classifiers
19
The dot product followed by the threshold constitutes a neuron
Five neurons in this picture (four in hidden layer and one output)
This is a two layer feed forward neural network
![Page 20: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/20.jpg)
But where do the inputs come from?
20
What if the inputs were the outputs of a classifier?The input layer
We can make a three layer network…. And so on.
![Page 21: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/21.jpg)
Let us try to formalize this
21
![Page 22: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/22.jpg)
Neural networks
• A robust approach for approximating real-valued, discrete-valued or vector valued functions
• Among the most effective general purpose supervised learning methods currently known– Especially for complex and hard to interpret data such as real-
world sensory data
• The Backpropagation algorithm for neural networks has been shown successful in many practical problems– handwritten character recognition, speech recognition, object
recognition, some NLP problems
22
![Page 23: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/23.jpg)
Biological neurons
23
The first drawing of a brain cells by Santiago Ramón y Cajal in 1899
Neurons: core components of brain and the nervous system consisting of
1. Dendrites that collect information from other neurons
2. An axon that generates outgoing spikes
![Page 24: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/24.jpg)
Biological neurons
24
The first drawing of a brain cells by Santiago Ramón y Cajal in 1899
Neurons: core components of brain and the nervous system consisting of
1. Dendrites that collect information from other neurons
2. An axon that generates outgoing spikes
Modern artificial neurons are “inspired” by biological neurons
But there are many, many fundamental differences
Don’t take the similarity seriously (as also claims in the news about the “emergence” of intelligent behavior)
![Page 25: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/25.jpg)
Artificial neurons
Functions that very loosely mimic a biological neuron
A neuron accepts a collection of inputs (a vector x) and produces an output by:
1. Applying a dot product with weights w and adding a bias b2. Applying a (possibly non-linear) transformation called an activation
25
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝒘!𝒙 + 𝑏)
![Page 26: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/26.jpg)
Artificial neurons
Functions that very loosely mimic a biological neuron
A neuron accepts a collection of inputs (a vector x) and produces an output by:
1. Applying a dot product with weights w and adding a bias b2. Applying a (possibly non-linear) transformation called an activation
26
Dot product
Threshold activation
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝒘!𝒙 + 𝑏)
![Page 27: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/27.jpg)
Artificial neurons
Functions that very loosely mimic a biological neuron
A neuron accepts a collection of inputs (a vector x) and produces an output by:
1. Applying a dot product with weights w and adding a bias b2. Applying a (possibly non-linear) transformation called an activation
27
Dot product
Threshold activation
Other activations are possible
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝒘!𝒙 + 𝑏)
![Page 28: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/28.jpg)
Activation functions
Name of the neuron Activation function: 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝑧)Linear unit 𝑧Threshold/sign unit sgn(𝑧)
Sigmoid unit1
1 + exp (−𝑧)Rectified linear unit (ReLU) max (0, 𝑧)Tanh unit tanh (𝑧)
28
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝒘!𝒙 + 𝑏)
Many more activation functions exist (sinusoid, sinc, gaussian, polynomial…)
Also called transfer functions
![Page 29: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/29.jpg)
A neural network
A function that converts inputs to outputs defined by a directed acyclic graph
– Nodes organized in layers, correspond to neurons
– Edges carry output of one neuron to another, associated with weights
• To define a neural network, we need to specify:– The structure of the graph
• How many nodes, the connectivity– The activation function on each node– The edge weights
29
![Page 30: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/30.jpg)
A neural network
A function that converts inputs to outputs defined by a directed acyclic graph
– Nodes organized in layers, correspond to neurons
– Edges carry output of one neuron to another, associated with weights
• To define a neural network, we need to specify:– The structure of the graph
• How many nodes, the connectivity– The activation function on each node– The edge weights
30
Input
Hidden
Output
w#$%
w#$&
![Page 31: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/31.jpg)
A neural network
A function that converts inputs to outputs defined by a directed acyclic graph
– Nodes organized in layers, correspond to neurons
– Edges carry output of one neuron to another, associated with weights
• To define a neural network, we need to specify:– The structure of the graph
• How many nodes, the connectivity– The activation function on each node– The edge weights
31
Input
Hidden
Output
w#$%
w#$&
![Page 32: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/32.jpg)
A neural network
A function that converts inputs to outputs defined by a directed acyclic graph
– Nodes organized in layers, correspond to neurons
– Edges carry output of one neuron to another, associated with weights
• To define a neural network, we need to specify:– The structure of the graph
• How many nodes, the connectivity– The activation function on each node– The edge weights
32
Called the architectureof the networkTypically predefined, part of the design of the classifier
Input
Hidden
Output
w#$%
w#$&
![Page 33: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/33.jpg)
A neural network
A function that converts inputs to outputs defined by a directed acyclic graph
– Nodes organized in layers, correspond to neurons
– Edges carry output of one neuron to another, associated with weights
• To define a neural network, we need to specify:– The structure of the graph
• How many nodes, the connectivity– The activation function on each node– The edge weights
33
Called the architectureof the networkTypically predefined, part of the design of the classifier
Learned from data
Input
Hidden
Output
w#$%
w#$&
![Page 34: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/34.jpg)
A brief history of neural networks
• 1943: McCullough and Pitts showed how linear threshold units can compute logical functions
• 1949: Hebb suggested a learning rule that has some physiological plausibility
• 1950s: Rosenblatt, the Peceptron algorithm for a single threshold neuron
• 1969: Minsky and Papert studied the neuron from a geometrical perspective
• 1980s: Convolutional neural networks (Fukushima, LeCun), the backpropagation algorithm (various)
• 2003-today: More compute, more data, deeper networks
34See also: http://people.idsia.ch/~juergen/deep-learning-overview.html
very
![Page 35: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/35.jpg)
What functions do neural networks express?
35
![Page 36: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/36.jpg)
A single neuron with threshold activation
36
Prediction = sgn(b +w1 x1 + w2x2)
++
++
+ +++
-- --
-- -- --
---- --
--
b +w1 x1 + w2x2=0
![Page 37: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/37.jpg)
Two layers, with threshold activations
37
In general, convex polygons
Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014
![Page 38: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/38.jpg)
Three layers with threshold activations
38
In general, unions of convex polygons
Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014
![Page 39: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/39.jpg)
Neural networks are universal function approximators
• Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989]
• Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993]
• Two layer threshold networks can express any Boolean function
• VC dimension of threshold network with edges E: 𝑉𝐶 = 𝑂(|𝐸| log |𝐸|)
• VC dimension of sigmoid networks with nodes V and edges E:– Upper bound: Ο 𝑉 ! 𝐸 !
– Lower bound: Ω 𝐸 !
39
Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness
![Page 40: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/40.jpg)
Neural networks are universal function approximators
• Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989]
• Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993]
• Two layer threshold networks can express any Boolean function
• VC dimension of threshold network with edges E: 𝑉𝐶 = 𝑂(|𝐸| log |𝐸|)
• VC dimension of sigmoid networks with nodes V and edges E:– Upper bound: Ο 𝑉 ! 𝐸 !
– Lower bound: Ω 𝐸 !
40
![Page 41: Machine Learningzhe/teach/pdf/neural-networks-introductio… · Neural Networks: Introduction Machine Learning Fall 2017 Supervised Learning: The Setup 1 Machine Learning Spring 2020](https://reader033.fdocuments.us/reader033/viewer/2022043022/5f3d9c3e8ffe012a144a645c/html5/thumbnails/41.jpg)
Neural networks are universal function approximators
• Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989]
• Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993]
• Two layer threshold networks can express any Boolean function
• VC dimension of threshold network with edges E: 𝑉𝐶 = 𝑂(|𝐸| log |𝐸|)
• VC dimension of sigmoid networks with nodes V and edges E:– Upper bound: Ο 𝑉 ! 𝐸 !
– Lower bound: Ω 𝐸 !
41
Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness