ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1...
Transcript of ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1...
![Page 1: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/1.jpg)
Lecture 7-1 Application & Tips:
Learning rate, data preprocessing, overfitting
Sung Kim <[email protected]>
![Page 2: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/2.jpg)
https://www.udacity.com/course/viewer#!/c-ud730/l-6370362152/m-6379811827
Gradient descent
![Page 3: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/3.jpg)
https://www.udacity.com/course/viewer#!/c-ud730/l-6370362152/m-6379811827
Gradient descent
![Page 4: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/4.jpg)
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
Large learning rate: overshooting
![Page 5: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/5.jpg)
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
Small learning rate: takes too long, stops at local minimum
![Page 6: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/6.jpg)
Try several learning rates
• Observe the cost function
• Check it goes down in a reasonable rate
![Page 7: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/7.jpg)
Data (X) preprocessing for gradient descent
![Page 8: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/8.jpg)
Data (X) preprocessing for gradient descent
x1 x2 y
1 9000 A
2 -5000 A
4 -2000 B
6 8000 B
9 9000 C
![Page 9: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/9.jpg)
Data (X) preprocessing for gradient descent
x1 x2 y
1 9000 A
2 -5000 A
4 -2000 B
6 8000 B
9 9000 C
![Page 10: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/10.jpg)
Data (X) preprocessing for gradient descent
![Page 11: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/11.jpg)
Standardization
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
Chapter 2
[ 41 ]
Many machine learning algorithms that we will encounter throughout this book require some sort of feature scaling for optimal performance, which we will discuss in more detail in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn. Gradient descent is one of the many algorithms that benefit from feature scaling. Here, we will use a feature scaling method called standardization, which gives our data the property of a standard normal distribution. The mean of each feature is centered at value 0 and the feature column has a standard deviation of 1. For example, to standardize the j th feature, we simply need to subtract the sample mean jµ from every training sample and divide it by its standard deviation jσ :
j jj
j
µσ−
′ =x
x
Here jx is a vector consisting of the j th feature values of all training samples n .
Standardization can easily be achieved using the NumPy methods mean and std:
>>> X_std = np.copy(X)
>>> X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
>>> X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
After standardization, we will train the Adaline again and see that it now converges using a learning rate 0.01η = :
>>> ada = AdalineGD(n_iter=15, eta=0.01)
>>> ada.fit(X_std, y)
>>> plot_decision_regions(X_std, y, classifier=ada)
>>> plt.title('Adaline - Gradient Descent')
>>> plt.xlabel('sepal length [standardized]')
>>> plt.ylabel('petal length [standardized]')
>>> plt.legend(loc='upper left')
>>> plt.show()
>>> plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
>>> plt.xlabel('Epochs')
>>> plt.ylabel('Sum-squared-error')
>>> plt.show()
www.it-ebooks.info
Chapter 2
[ 41 ]
Many machine learning algorithms that we will encounter throughout this book require some sort of feature scaling for optimal performance, which we will discuss in more detail in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn. Gradient descent is one of the many algorithms that benefit from feature scaling. Here, we will use a feature scaling method called standardization, which gives our data the property of a standard normal distribution. The mean of each feature is centered at value 0 and the feature column has a standard deviation of 1. For example, to standardize the j th feature, we simply need to subtract the sample mean jµ from every training sample and divide it by its standard deviation jσ :
j jj
j
µσ−
′ =x
x
Here jx is a vector consisting of the j th feature values of all training samples n .
Standardization can easily be achieved using the NumPy methods mean and std:
>>> X_std = np.copy(X)
>>> X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
>>> X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
After standardization, we will train the Adaline again and see that it now converges using a learning rate 0.01η = :
>>> ada = AdalineGD(n_iter=15, eta=0.01)
>>> ada.fit(X_std, y)
>>> plot_decision_regions(X_std, y, classifier=ada)
>>> plt.title('Adaline - Gradient Descent')
>>> plt.xlabel('sepal length [standardized]')
>>> plt.ylabel('petal length [standardized]')
>>> plt.legend(loc='upper left')
>>> plt.show()
>>> plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
>>> plt.xlabel('Epochs')
>>> plt.ylabel('Sum-squared-error')
>>> plt.show()
www.it-ebooks.info
![Page 12: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/12.jpg)
Overfitting
• Our model is very good with training data set (with memorization)
• Not good at test dataset or in real use
![Page 13: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/13.jpg)
Overfitting
![Page 14: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/14.jpg)
Solutions for overfitting
• More training data!
• Reduce the number of features
• Regularization
![Page 15: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/15.jpg)
Regularization
• Let’s not have too big numbers in the weight
![Page 16: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/16.jpg)
Regularization
• Let’s not have too big numbers in the weight
![Page 17: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/17.jpg)
Regularization
• Let’s not have too big numbers in the weight
![Page 18: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/18.jpg)
Regularization
• Let’s not have too big numbers in the weight
![Page 19: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/19.jpg)
Summary
• Learning rate
• Data preprocessing
• Overfitting- More training data- Regularization
![Page 21: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/21.jpg)
Performance evaluation: is this good?
![Page 22: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/22.jpg)
Evaluation using training set?
• 100% correct (accuracy)
• Can memorize
![Page 23: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/23.jpg)
Training and test sets
http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning.html
![Page 24: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/24.jpg)
Training, validation and test sets
http://www.intechopen.com/books/advances-in-data-mining-knowledge-discovery-and-applications/selecting-representative-data-sets
![Page 25: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/25.jpg)
Online learning
http://www.intechopen.com/books/advances-in-data-mining-knowledge-discovery-and-applications/selecting-representative-data-sets
model
![Page 26: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/26.jpg)
MINIST Dataset
http://yann.lecun.com/exdb/mnist/
![Page 27: ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overfitting Sung Kim](https://reader030.fdocuments.us/reader030/viewer/2022040908/5e7f9ee08bf67b76f40598b5/html5/thumbnails/27.jpg)
Accuracy
• How many of your predictions are correct?
• 95% ~ 99%?
• Check out the lab video