Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and...

55
Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed 1 ETS Montreal

Transcript of Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and...

Page 1: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Laplacian Regularized Few Shot Learning (LaplacianShot)

Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed

1

ETS Montreal

Page 2: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Overview

2

Few-Shot Learning

- What and Why ?

- Brief discussion on existing approaches.

Proposed LaplacianShot

- The context

- Proposed formulation

- Optimization

- Proposed Algorithm

Experiments

- Experimental Setup

- SOTA results on 5 different few-shot benchmarks.

Page 3: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

3

Page 4: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

4

- Given C = 5 classes- Each class c having 1

examples.

(5-way 1-shot)

Learn a Model

From these

To classify this

Page 5: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

5

From these

2 4

- Given C = 5 classes- Each class c having 1

examples.

(5-way 1-shot)

Learn a Model

To classify this

Page 6: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning

6

Humans recognize perfectly with few examples

2 4

Page 7: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning

7

❏ Modern ML methods generalize poorly

❏ Need a better way.

Page 8: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-shot learning

8

A very large body of recent works, mostly based on:

Meta-learning framework

Page 9: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

9

Page 10: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

10

Training set with enough labeled data (base classes different from the test classes)

Page 11: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

11

Training set with enough labeled data to learn initial model

Page 12: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

12

Create episodes and do episodic training to learn meta-learner

Vinyal et al, (Neurips ‘16),Snell et al, (Neurips ‘17), Sung et al, (CVPR ‘ 18),Finn et al, (ICML‘ 17),Ravi et al, (ICLR‘ 17),Lee et al, (CVPR‘ 19),Hu et al, (ICLR ‘20),Ye et al, (CVPR ‘20), . . .

Page 13: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Taking a few steps backward . .

13

Recently [Chen et al., ICLR’19, Wang et al., ’19, Dhillon et al., ICLR’20] :

Simple baselines outperform the overly convoluted meta-learning based approaches.

Page 14: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Baseline Framework

14

No need to meta-train

Page 15: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Baseline Framework

15

Simple conventional cross-entropy training

The approaches mostly differ during inference

Page 16: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Inductive vs Transductive inference

16

Query/Test point

Examples

Vinayls et al., NEURIPS’ 16 (Attention mechanism)

Snell et al., NEURIPS’ 17(Nearest Prototype)

Supports

Page 17: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Inductive vs Transductive inference

17

Transductive: Predict for all test points, instead of one at a time

Query/Test points

Examples

Liu et. al., ICLR’19 (Label propagation)

Dhillon, ICLR’20 (Transductive fine-tuning)

Supports

Page 18: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

18

- Latent Assignment matrix for N query samples:

- Label assignment for each query:

- And Simplex Constraints:

Laplacian-regularized objective:

Page 19: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

19

Laplacian-regularized objective:

Nearest Prototype classification

When

Similar to ProtoNet (Snell ’17) or SimpleShot (Wang ’19)

Laplacian RegularizationWell known in Graph Laplacian:Spectral clustering (Shi I‘00, Von ‘07) , SLK (Ziko ’18)SSL (Weston ‘12, Belkin ‘06)

Page 20: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Takeaways

20

✓ SOTA results without bell and whistles.

✓ Simple constrained graph clustering works very well.

✓ No network fine-tuning, neither meta-learning

✓ Fast transductive inference: almost inductive time

✓ Model Agnostic

Page 21: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

21

LapLacianShot

More Details

Page 22: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

22

Laplacian-regularized objective:

Nearest Prototype classification

- Feature embedding:

- Prototype can be :

- The support example in 1-shot or- Simple mean from support examples or- Weighted mean from both support and initially predicted query samples

When

Labeling according to nearest support prototypes

Page 23: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

23

Laplacian-regularized objective:

Laplacian RegularizationWell known in Graph Laplacian:

Encourages nearby points to have similar assignments

Pairwise similarity

Page 24: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

24

Laplacian-regularized objective:

Tricky to optimize due to:

Page 25: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

25

Laplacian-regularized objective:

Tricky to optimize due to:

✖ Simplex/Integer Constraints.

Page 26: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

26

Laplacian-regularized objective:

Tricky to optimize due to:

✖ Laplacian over discrete variables.

Page 27: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

27

Laplacian-regularized objective:

Relax integer constraints:

➢ Convex quadratic problem

✖ Require solving for the N×C variables all together

✖ Extra projection steps for the simplex constraints

Page 28: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

28

Laplacian-regularized objective:

We do:

✓ Concave relaxation✓ Independent and closed-form updates for

each assignment variable✓ Efficient bound optimization

Page 29: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

29

Page 30: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

30

=

=

When

Equal

Page 31: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

31

=

When

NotEqual

Page 32: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

32

=

When

NotEqual

Degree

��

��

Page 33: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

33

=

Remove constant terms

Page 34: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

34

Concave for PSD matrix

Page 35: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave-Convex relaxation

35

Convex barrier function:

● Avoids extra dual variables for● Closed- form update for the simplex constraint duel

Putting it altogether

Page 36: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Bound optimization

36

First-order approximationof concave term

Fixed unary

Page 37: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Where:

Bound optimization

37

We get Iterative tight upper bound:

Iteratively optimize:

Page 38: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Bound optimization

38

Independent upper bound:

Page 39: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

KKT conditions brings closed form updates:

Bound optimization

39

Minimize Independent upper bound:

Page 40: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Algorithm

40

Page 41: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

41

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Generic Classification

miniImageNet splits: 64 base, 16 validation and 20 test classestieredImageNet splits: 351 base, 97 validation and 160 test classes

Fine-Grained Classification

Splits: 100 base, 50 validation and 50 test classes

Page 42: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

42

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Evaluation protocol:

- 5-way 1-shot/5-shot .- 15 query samples per class

(N=75).- Average accuracy over 10,000

few-shot tasks with 95% confidence interval.

Page 43: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

43

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

- More realistic and challenging- Recently introduced (Wertheimer&

Hariharan, 2019) - Slight class distinction- Imbalanced class distribution with

variable number of supports/query per class

Page 44: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

44

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Evaluation protocol:

- 227-way multi-shot .- Top-1 accuracy averaged over

the test images Per Class.- Top-1 accuracy averaged over all

the test images (Mean)

Page 45: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

45

We do Cross-entropy training with base classes

LaplacianShot during inference

Page 46: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Mini-ImageNet)

46

Page 47: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Mini-ImageNet)

47

Page 48: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Tiered-ImageNet)

48

Page 49: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (CUB)

49

Cross Domain

Page 50: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (iNat)

50

Page 51: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Choosing

51

Page 52: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Convergence

52

Page 53: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Average Inference time

53

Transductive

Page 54: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Takeaways

54

✓ SOTA results without bell and whistles.

✓ Simple constrained graph clustering works very well.

✓ No network fine-tuning, neither meta-learning

✓ Fast transductive inference: almost inductive time

✓ Model Agnostic: during inference with any training model and gain up to 4/5%!!!

Page 55: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Thank you

Code On:https://github.com/imtiazziko/LaplacianShot

55