Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief...
-
Upload
lisa-vanstone -
Category
Documents
-
view
214 -
download
0
Transcript of Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief...
![Page 1: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/1.jpg)
Neural networks
IntroductionFitting neural networksGoing beyond single hidden layerBrief discussion of deep learning
![Page 2: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/2.jpg)
Neural network
K-class classification: K nodes in top layer
Continuous outcome: Single node in top layer
![Page 3: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/3.jpg)
Neural network
K-class classification.
Zm are created from linear combinations of the inputs,Yk is modeled as a function of linear combinations of the Zm
For classification, can use a simple gk(T) =Tk.
![Page 4: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/4.jpg)
Neural network
![Page 5: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/5.jpg)
y1: x1 + x2 + 0.5 ≥ 0
y2: x1 +x2 −1.5 ≥ 0
z1 = +1 if and only if both y1 and y2 have value +1
A simple network with linear functions.
Neural network
“bias”: intercept
![Page 6: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/6.jpg)
Neural network
![Page 7: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/7.jpg)
Neural network
![Page 8: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/8.jpg)
Fitting Neural Networks
Set of parameters (weights):
Objective function:
Regression:
Classification: cross-entropy (deviance)
![Page 9: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/9.jpg)
Fitting Neural Networks
minimizing R(θ) is by gradient descent, called “back-propagation”Middle-layer values for each data point:
We use the square error loss for demonstration:
![Page 10: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/10.jpg)
Fitting Neural Networks
Derivatives:
Descent along the gradient:
:earning rate
k
m
l
i: observation index
![Page 11: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/11.jpg)
Fitting Neural Networks
By definition
![Page 12: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/12.jpg)
Fitting Neural Networks
General workflow of back-propagation:
Forward: fix weights and compute
Backward: compute
back propagate to compute
use both to compute the gradients for the updates
update the weights
![Page 13: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/13.jpg)
Fitting Neural Networks
Can use parallel computing - each hidden unit passes and receives information only to and from units that share a connection.
Online training the fitting scheme allows the network to handle very large training sets, and also to update the weights as new observations come in.
Training neural network is an “art” –
the model is generally overparametrizedoptimization problem is nonconvex and unstable
A neural network model is a blackbox and hard to directly interpret
![Page 14: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/14.jpg)
Fitting Neural Networks
Initiation
When weight vectors are close to length zero all Z values are close to zero. The sigmoid curve is close to linear. the overall model is close to linear. a relatively simple model. (This can be seen as a regularized solution)
Start with very small weights.
Let the neural network learn necessary nonlinear relations from the data.
Starting with large weights often leads to poor solutions.
![Page 15: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/15.jpg)
Fitting Neural Networks
Overfitting
The model is too flexible, involving too many parameters. May easily overfit the data.
Early stopping – do not let the algorithm converge. Because the model starts with linear, this is a regularized solution (towards linear).
Explicit regularization (“weight decay”) – minimize
tends to shrink smaller weights more.
Cross-validation is used to estimate λ.
![Page 16: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/16.jpg)
Fitting Neural Networks
![Page 17: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/17.jpg)
Fitting Neural Networks
![Page 18: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/18.jpg)
Fitting Neural Networks
Number of Hidden Units and Layers
Too few – might not have enough flexibility to capture the nonlinearities in the data
Too many – overly flexible, BUT extra weights can be shrunk toward zero if appropriate regularization is used. ✔
Typical range: 5-100
Cross-validation can be used. It may not be necessary if cross-validation is used to tune the regularization parameter.
![Page 19: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/19.jpg)
Examples
“A radial function is in a sense the most difficult for the neural net, as it is spherically symmetric and with no preferred directions.”
![Page 20: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/20.jpg)
Examples
![Page 21: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/21.jpg)
Examples
![Page 22: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/22.jpg)
Going beyond single hidden layer
A benchmark problem: classification of handwritten numerals.
![Page 23: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/23.jpg)
3x3 1
5x5 1
Going beyond single hidden layer
same operation on different parts
each of the units in a single 8 × 8
feature map share the same set
of nine weights (but have their
own bias parameter)
3x3 1
5x5 1No weight sharing
weight shared
![Page 24: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/24.jpg)
Going beyond single hidden layer
![Page 25: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/25.jpg)
Going beyond single hidden layer
![Page 26: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/26.jpg)
Deep learning
Data Features Model
Finding the correct features is critical in the success.- Kernels in SVM- Hidden layer nodes in neural network- Predictor combinations in RF
A successful machine learning technology needs to be able to extract useful features (data representations) on its own.
Deep learning methods: - Composition of multiple non-linear transformations of the data- Goal: more abstract – and ultimately more useful
representations
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
![Page 27: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/27.jpg)
Deep learning
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
![Page 28: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/28.jpg)
Deep learning
Nature 505, 146–148 (09 January 2014)
Has to learn high level abstract concepts from data.
Ex:Wheels of a car.Eye, nose, etc. of a face
Be very resistant to irrelevant information.
Ex:Car’s orientation
![Page 29: Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c765503460f94929da0/html5/thumbnails/29.jpg)
Deep learning
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828
Major areas of application- Speech Recognition and Signal Processing- Object Recognition- Natural Language Processing……
So far in bioinformatics
- Training data size (subjects) is still too small compared to the number of variables (N<<p issue)
- Could be applied when human selection of variables is done first.
- Biological knowledge, in the form of existing networks, are already explicitly used, instead of being learned from data. They are hard to beat with a limited amount of data.