Unsupervised feature learning for audio classification using convolutional deep belief networks
Deep Belief Networks
description
Transcript of Deep Belief Networks
![Page 1: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/1.jpg)
Deep Belief Networks
Psychology 209February 22, 2013
![Page 2: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/2.jpg)
Why a Deep Network?
• Why not just one layer of hidden units?• Fails to capture constraints on the
problem.• For many problems, requires exponential
hardware.• Two examples:
– Parity– Letters x positions
![Page 3: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/3.jpg)
![Page 4: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/4.jpg)
![Page 5: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/5.jpg)
But, says Le Cun…
![Page 6: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/6.jpg)
![Page 7: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/7.jpg)
![Page 8: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/8.jpg)
Stacked Auto-Encoders• To capture
intermediate level structure, one might use stacked auto-encoders.
• But, training can be very slow as more layers are added.– Backprop slows
exponentially in the number of layers
![Page 9: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/9.jpg)
The deep belief network vision (Hinton)
• Consider some sense data D
• We imagine our goal is to understand what generated it
• We use a generative model
• Search for the most probable ‘cause’ C of the data– The one where p(D|C)p(C)
is greatest• How do we find C?
Cause
Data
![Page 10: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/10.jpg)
One and Two Layer Belief Networks
How should we train such networks?
![Page 11: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/11.jpg)
![Page 12: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/12.jpg)
![Page 13: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/13.jpg)
![Page 14: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/14.jpg)
Stacking RBM’s
• ‘Greedy’ layerwise learning of RBM’s– First learn H0 based on input.– Then learn H1 based on H0– Etc…– Then ‘fine tune’ says Hinton
![Page 15: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/15.jpg)
![Page 16: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/16.jpg)
![Page 17: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/17.jpg)
![Page 18: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/18.jpg)
![Page 19: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/19.jpg)
Test Procedure• Generation:
– Clamp a digit identity– Do ‘alternating Gibbs
sampling’ from random starting image; send state back down to see what it is like
• Recognition– Clamp input pattern on
‘retina’– Feed up, perform
alternating Gibbs sampling at top levels.
Check out the movie: http://www.cs.toronto.edu/~hinton/digits.html
![Page 20: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/20.jpg)
Close Calls (49) and Errors (125) out of 10,000 Test Digits
![Page 21: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/21.jpg)
That’s great says Yann LeCun…
• But it doesn’t always work so well• We need to reduce the Energy (increase
the goodness) of the sample data (Y) and decrease the goodness of everything else (Y’)
• But there is too much ‘everything else’.Y’
![Page 22: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/22.jpg)
LeCun’s view of Stacked Encoder Networks
• Think of each layer as an encoder-decoder pair learning to minimize its own ‘reconstruction error’ ~ ‘maximize the probability of the training data’
• Starting from this, can we make the encoder/decoder more powerful and also more constrained than an RBM?
![Page 23: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/23.jpg)
Two New Ideas and One Old• Force the representation to be sparse
– Can’t represent too many possibilities, so makes most of the input bad automatically!
– Just pull down the Energy of the samples and the rest will take care of itself!
• Let the Encoder be as smart as you want it to be.– Why just use one feed-forward layer on the encoder side of each
layer? Why not use the full potential of a multi-layer network?
• Force invariance by re-using the same weights at many positions across lower layers
![Page 24: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/24.jpg)
![Page 25: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/25.jpg)
![Page 26: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/26.jpg)
IMAGENET Large Scale Visual Recognition Challenge 2012
• Tasks:– Classification– Classification with Localization
• Training data: 1.2 M images from 1,000 classes.
– English setter– Granny Smith– Ladle
• Validation set: 50,000 images not in training set
• Test set: 100,000 images not in Validation or training set.
• An item is scored as correct if the correct answer is one of the network’s top 5 guesses
![Page 27: Deep Belief Networks](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816379550346895dd45791/html5/thumbnails/27.jpg)
The ResultsClassification• Team Error Rate
SuperVision .164Runner-Up .262
Localization• Team Error Rate
SuperVision .342Runner-Up .500
• SuperVision Team:
Alex Krizhevsky Ilya Sutskever Geoffrey Hinton
SuperVision Model:
Our model is a large, deep convolutional neural network trained on raw RGB pixel values. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three globally-connected layers with a final 1000-way softmax. It was trained on two NVIDIA GPUs for about a week. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of convolutional nets. To reduce overfitting in the globally-connected layers we employed hidden-unit "dropout", a recently-developed regularization method that proved to be very effective.
Dropout: For each presentation of an item during learning force a fraction of the hidden units chosen at random to have activation value zero.