CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019....
Transcript of CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019....
![Page 1: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/1.jpg)
CS330 Paper Presentation: October 16th, 2019
![Page 2: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/2.jpg)
Supervised Classification
![Page 3: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/3.jpg)
Semi-Supervised Classification: More realistic datasetLabelled
Unlabelled
![Page 4: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/4.jpg)
Semi-Supervised ClassificationMost “biologically plausible” learning regime
![Page 5: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/5.jpg)
A familiar problem:
?
Few-shot, multi-task learning:Generalize to unseen classes
![Page 6: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/6.jpg)
A new twist on a familiar problem:
?
![Page 7: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/7.jpg)
How can we leverage unlabelled data for few-shot classification?
![Page 8: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/8.jpg)
![Page 9: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/9.jpg)
Unlabelled data may come from the support set or not (distractors)
![Page 10: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/10.jpg)
Strategy:As we can now appreciate, there are a number of possible ways to approach the original problem. To name a few:
● Siamese Networks (Koch et al, 2015)● Matching Networks (Vinyals et al., 2016)● Prototypical Networks (Snell et al., 2017)● Weight initialization / Update step learning (Ravi et al., 2017, Finn et al., 2017)● MANN (Santoro et al., 2016)● Temporal convolutions (Mishra et al., 2017)
All are reasonable starting points for semi-supervised few-shot classification problem!
![Page 11: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/11.jpg)
Prototypical Networks (Snell et al., 2017)Very simple inductive bias!
![Page 12: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/12.jpg)
Prototypical Networks (Snell et al., 2017)
For each class, compute prototype
Embedding is generated via a simple convnet:Pixels - 64 [3x3] Filters - Batchnorm - ReLU - [2x2] MaxPool = 64D Vector
https://jasonyzhang.com/convnet/
![Page 13: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/13.jpg)
Prototypical Networks (Snell et al., 2017)
For each class, compute prototype
Softmax distribution of distances to prototypes for newimage
Compute loss
![Page 14: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/14.jpg)
Prototypical Networks (Snell et al., 2017)
For each class, compute prototype
Softmax distribution of distances to prototypes for newimage
Compute loss
Very simple inductive bias:Reduces to a linear model with Euclidean distance
![Page 15: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/15.jpg)
Strategy for semi-supervised:Refine Prototypes centers with unlabelled data.
Support
Unlabelled
Test
![Page 16: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/16.jpg)
1. Start with labelled prototypes2. Give each unlabelled input a partial assignment to
each cluster3. Incorporate unlabelled examples into original
prototype
Strategy for semi-supervised:
![Page 17: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/17.jpg)
Prototypical networks with Soft k-means
Unlabelled support set
Partial Assignment
![Page 18: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/18.jpg)
What about distractor classes?
Prototypical networks with Soft k-means
![Page 19: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/19.jpg)
Add a buffering prototype at the origin to “capture the distractors”
Prototypical networks with Soft k-means w/ Distractor Cluster
![Page 20: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/20.jpg)
Add a buffering prototype at the origin to “capture the distractors”
Prototypical networks with Soft k-means w/ Distractor Cluster
Assumption: Distractors all come from one class!
![Page 21: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/21.jpg)
Soft k-means + Masking Network
1. Distance
2. Compute mask with small network
![Page 22: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/22.jpg)
Soft k-means + Masking Network
differentiable
![Page 23: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/23.jpg)
Soft k-means + Masking
In practice, MLP is a dense layer with 20 hidden units (tanh nonlinearity)
![Page 24: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/24.jpg)
Datasets● Omniglot● miniImageNet (600 images from 100 classes)
![Page 25: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/25.jpg)
Hierarchical Datasets
Omniglot
tieredImageNet
![Page 26: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/26.jpg)
miniImageNet:Test - electric guitarTrain - acoustic guitar
tierediImageNet:Test - musical instrumentsTrain - farming equipment
tieredImagenet
![Page 27: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/27.jpg)
Datasets● Omniglot● miniImageNet (600 images from 100 classes)● tieredImageNet (34 broad categories, each containing 10 to 30 classes)
10% goes to labeled splits 90% goes to unlabelled classes and distractors*
*40/60 for miniImageNet
![Page 28: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/28.jpg)
Datasets● Omniglot● miniImageNet (600 images from 100 classes)● tieredImageNet (34 broad categories, each containing 10 to 30 classes)
10% goes to labeled splits 90% goes to unlabelled classes and distractors*
*40/60 for miniImageNet
Much less labelled data than standard few-shot approaches!!!
![Page 29: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/29.jpg)
DatasetsN: ClassesK: Labelled samples from each classM: Unlabelled samples from N classesH: Distractors (Unlabelled sample from classes other than N)
H=N=5M=5 for training & M=20 for testing
![Page 30: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/30.jpg)
Baseline Models
1.
1. Vanilla Protonet
![Page 31: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/31.jpg)
Baseline Models
1.2.
1. Vanilla Protonet2. Vanilla Protonet + one step of Soft k-means refinement at
test only (supervised embedding)
![Page 32: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/32.jpg)
Results: Omniglot
![Page 33: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/33.jpg)
Results: miniImageNet
![Page 34: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/34.jpg)
Results: tieredImageNet
![Page 35: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/35.jpg)
Results: Other Baselines
![Page 36: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/36.jpg)
ResultsModels trained with M=5During meta-test: vary amount of unlabelled examples
![Page 37: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/37.jpg)
Results
![Page 38: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/38.jpg)
Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets
![Page 39: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/39.jpg)
Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors
![Page 40: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/40.jpg)
Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors3. Novel: models extrapolate to increases in amount of labelled data
![Page 41: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/41.jpg)
Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors3. Novel: models extrapolate to increases in amount of labelled data4. New dataset: tieredImageNet
![Page 42: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/42.jpg)
Critiques:1. Results are convincing, but the work is actually a relatively straightforward
application of (a) Protonets and (b) k-means clustering2. Model Choice: protonets are very simple. It’s not clear what they gained by
the simple inductive bias 3. Presented approach does not generalize well beyond classification problems
![Page 43: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/43.jpg)
Future directions: extension to unsupervised learning
I would be really interested in withholding labels alltogetherCan the model learn how many classes there are?
… and correctly classify them?
![Page 44: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/44.jpg)
Future directions: extension to unsupervised learning
![Page 45: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/45.jpg)
Thank you!
![Page 46: CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classification. Semi-Supervised Classification: More realistic dataset Labelled](https://reader030.fdocuments.us/reader030/viewer/2022041113/5f1feebb91855a40025bd4b9/html5/thumbnails/46.jpg)
Supplemental:Accounting for Intra-Cluster Distance