Multi-Prediction Deep Boltzmann...
Transcript of Multi-Prediction Deep Boltzmann...
![Page 1: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/1.jpg)
Multi-Prediction Deep Boltzmann Machines
Goodfellow, Mirza, Courville, Bengio
Vipul Venkataraman Nov 29, 2016
![Page 2: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/2.jpg)
Outline• Goal of the paper [1]
• A primer on RBMs and DBMs
• Training DBMs
• Proposed method: motivations and intuitions
• Results
• Conclusions
![Page 3: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/3.jpg)
Goal
![Page 4: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/4.jpg)
Goal of the paper
Make training unsupervised models great again!
![Page 5: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/5.jpg)
Goal of the paper
Make training unsupervised models great again!
Deep Boltzmann Machines
![Page 6: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/6.jpg)
Preliminaries
![Page 7: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/7.jpg)
Deep Boltzmann Machines
Image: [2]
![Page 8: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/8.jpg)
Deep Boltzmann Machines
Image: [2]
Training
![Page 9: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/9.jpg)
Deep Boltzmann Machines
• Unsupervised
• Generative model
• Feature learning algorithm
![Page 10: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/10.jpg)
Deep Boltzmann Machines
• Unsupervised
• Generative model
• Feature learning algorithm
2
![Page 11: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/11.jpg)
Deep Boltzmann Machines
• Unsupervised
• Generative model
• Feature learning algorithm
2
![Page 12: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/12.jpg)
Deep Boltzmann Machines
![Page 13: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/13.jpg)
Training Methods
![Page 14: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/14.jpg)
Deep Boltzmann Machines
Classification
• Exact inference is intractable
• Use mean field expectations of the hidden units
![Page 15: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/15.jpg)
Training DBMsSteps [2]:
1. Layer-wise pre-training
• Unsupervised
• RBMs as building blocks
2. Discriminative fine-tuning
• Supervised
• Back-propagationGood reference: https://www.youtube.com/watch?v=Oq38pINmddk
![Page 16: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/16.jpg)
Pre-training
RBM
![Page 17: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/17.jpg)
RBM
Pre-training
![Page 18: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/18.jpg)
Fine-tuning
MLP
![Page 19: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/19.jpg)
!Pre-training
• Deep: good features
• Can use any unsupervised algorithm
• RBM (w/ CD)
• Auto-encoder
Fine-tuning
• Won’t make drastic changes
• Need less labelled data
• Can use a lot of unlabelled data
![Page 20: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/20.jpg)
"
• Greedy training, not considering global interactions
• Many models, criteria
• Extra classifier as well
• CD-k: we don’t know k
• Gradient approximation may be bad
![Page 21: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/21.jpg)
An Aside: CD Intuition
![Page 22: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/22.jpg)
An Aside: CD Intuition
![Page 23: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/23.jpg)
An Aside: CD Intuition
• Far away ‘holes’
• May want our particles to move many steps [3]
• The mixing may get slower
• CD-1 -> CD-3 -> CD-10
![Page 24: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/24.jpg)
Solutions
![Page 25: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/25.jpg)
Proposed method
• Mantra: Simplify
![Page 26: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/26.jpg)
Proposed method
• Mantra: Simplify
• Many models -> one model
![Page 27: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/27.jpg)
Proposed method
• Mantra: Simplify
• Many models -> one model
• Many criteria -> one criterion
![Page 28: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/28.jpg)
Proposed method
• Mantra: Simplify
• Many models -> one model
• Many criteria -> one criterion
• Extra classification layer at the top -> unified model
![Page 29: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/29.jpg)
Quick recap
![Page 30: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/30.jpg)
Multi-Prediction Training
![Page 31: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/31.jpg)
Random bit-mask
![Page 32: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/32.jpg)
Example 1
![Page 33: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/33.jpg)
Example 1, update 1
![Page 34: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/34.jpg)
Example 1, update 2
Two mean-field fixed point updates
![Page 35: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/35.jpg)
Example 2, all updates
![Page 36: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/36.jpg)
Example 3, all updates
![Page 37: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/37.jpg)
One iteration
![Page 38: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/38.jpg)
One iteration
Minibatch
![Page 39: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/39.jpg)
One iteration
Minibatch Backprop
![Page 40: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/40.jpg)
Performance
• Works well (results in a bit)
• Expensive though
• Needing to run several iterations for convergence
![Page 41: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/41.jpg)
Multi-Inference Trick
Mean field
![Page 42: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/42.jpg)
Multi-Inference Trick
Mean field
Multi-inference
average with the mean-field estimate
![Page 43: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/43.jpg)
Multi-Inference Trick
Mean field
Multi-inference
average with the mean-field estimate
Nesterov’s accelerated gradient descent
![Page 44: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/44.jpg)
Results
Can someone find me a suitable picture?
![Page 45: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/45.jpg)
Multi-Inference Trick
Image: Goodfellow's defense
![Page 46: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/46.jpg)
Setting• Dataset: MNIST
• First layer: 500 hidden units
• Second layer: 1000 hidden units
• Minibatch size: 100 examples
• Test set: 10000 examples
• For more related results: [1]
![Page 47: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/47.jpg)
Classification
MP-DBM with 2X hidden units: 0.91
![Page 48: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/48.jpg)
Robustness
![Page 49: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/49.jpg)
Missing inputs
![Page 50: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/50.jpg)
Conclusions
• Simpler, more intuitive methodology for training Deep Boltzmann Machines
• Improved accuracy for approximate inference problems
![Page 51: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/51.jpg)
References
[1] Goodfellow, Mirza, Courville, and Bengio. Multi-prediction deep Boltzmann machines. NIPS ’13.
[2] Salakhutdinov and Hinton. Deep Boltzmann machines. AISTATS 2009.
[3] Hinton. Neural Networks for Machine Learning. Coursera.
![Page 52: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/52.jpg)
Questions?
![Page 53: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,](https://reader036.fdocuments.us/reader036/viewer/2022070110/60484cc5cac77849d43d0553/html5/thumbnails/53.jpg)
#