Neural Networks with Google’s TensorFlow -...
-
Upload
nguyentram -
Category
Documents
-
view
225 -
download
1
Transcript of Neural Networks with Google’s TensorFlow -...
Overview
1. Neural Networks basics
2. Neural Networks specifics
3. Neural Networks with Googe’s TensorFlow
4. Coreference: Singleton classification example
Resources
• Deep learning course (Google) @ Udacity
• Machine learning course (Stanford, Andrew Ng) @ coursera
• Neural Network course (Geoffrey Hinton) @ coursera
Pros and cons of linear models
Pros:
• Fast
• Numerically stable
• Derivative is constant
Cons:
• Limited to modeling additive features
• Multiplicative or higher order features leas to huge parameter space, not suitable for non-linear mapping
Conclusion:
We want to use parameters within linear functions but able to efficiently do non-linear mapping.
ReLU – a non-linear activation function to put in the hidden layer ReLU is one of many choices of a non-linear activation function.
https://en.wikipedia.org/wiki/Activation_function
Training a neural network
• Basically similar to training a linear model by optimizing a convex function using a method like gradient descent
• Example cost function for logistic based activation
Cost function – this is universal for linear classifier or NN
• Cost function is a function of the parameters that captures the difference between predicted and gold label, therefore we want to minimize it.
• How to minimize? Using gradient descent, at each iteration, adjust the weights.
• How to adjust weights? Subtracting gradient (derivative) will move you toward the minimum.
Gradient descent
• Keep in mind that W is a matrix, so we need to compute partial derivative with respect to each element of W, and sum them up.
Gradient Descent flavors
• Batch GD: classic approach, summing over derivative for all training examples at each iteration in order to perform one update to weights, very slow, but more stable, almost never used today
• Stochastic GD: only takes one example at each iteration and use the gradient computed from that example to adjust weights, fast, but less stable behavior
• Mini-batch GD: (in between) takes a mini-batch of examples (such as from 100 to 2000) and sum up those terms derivatives to perform update, balance between stability and speed (also good results), most used today
Neural Network training: forward backward propagation Intuition from linear classifier:
Repeat:
• Compute an output
• Compute error
• Adjust weights
(my implementation in Octave)
Hyper parameter tuning (loss curve)
• Number of hidden nodes
• Learning rate
• Batch size
• Number of steps
• Overfitting
Example code for notMNIST dataset (Udacity)
• https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/udacity (This set of ipython notebook is not only partial implementation, since it is meant to be an assignment to be completed. To view a complete implementation, refer to the .ipynb and html files I uploaded on the corpling server).