Download - Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Transcript
Page 1: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Laplacian Regularized Few Shot Learning (LaplacianShot)

Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed

1

ETS Montreal

Page 2: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Overview

2

Few-Shot Learning

- What and Why ?

- Brief discussion on existing approaches.

Proposed LaplacianShot

- The context

- Proposed formulation

- Optimization

- Proposed Algorithm

Experiments

- Experimental Setup

- SOTA results on 5 different few-shot benchmarks.

Page 3: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

3

Page 4: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

4

- Given C = 5 classes- Each class c having 1

examples.

(5-way 1-shot)

Learn a Model

From these

To classify this

Page 5: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning (An example)

5

From these

2 4

- Given C = 5 classes- Each class c having 1

examples.

(5-way 1-shot)

Learn a Model

To classify this

Page 6: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning

6

Humans recognize perfectly with few examples

2 4

Page 7: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-Shot Learning

7

❏ Modern ML methods generalize poorly

❏ Need a better way.

Page 8: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Few-shot learning

8

A very large body of recent works, mostly based on:

Meta-learning framework

Page 9: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

9

Page 10: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

10

Training set with enough labeled data (base classes different from the test classes)

Page 11: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

11

Training set with enough labeled data to learn initial model

Page 12: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Meta-Learning Framework

12

Create episodes and do episodic training to learn meta-learner

Vinyal et al, (Neurips ‘16),Snell et al, (Neurips ‘17), Sung et al, (CVPR ‘ 18),Finn et al, (ICML‘ 17),Ravi et al, (ICLR‘ 17),Lee et al, (CVPR‘ 19),Hu et al, (ICLR ‘20),Ye et al, (CVPR ‘20), . . .

Page 13: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Taking a few steps backward . .

13

Recently [Chen et al., ICLR’19, Wang et al., ’19, Dhillon et al., ICLR’20] :

Simple baselines outperform the overly convoluted meta-learning based approaches.

Page 14: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Baseline Framework

14

No need to meta-train

Page 15: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Baseline Framework

15

Simple conventional cross-entropy training

The approaches mostly differ during inference

Page 16: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Inductive vs Transductive inference

16

Query/Test point

Examples

Vinayls et al., NEURIPS’ 16 (Attention mechanism)

Snell et al., NEURIPS’ 17(Nearest Prototype)

Supports

Page 17: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Inductive vs Transductive inference

17

Transductive: Predict for all test points, instead of one at a time

Query/Test points

Examples

Liu et. al., ICLR’19 (Label propagation)

Dhillon, ICLR’20 (Transductive fine-tuning)

Supports

Page 18: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

18

- Latent Assignment matrix for N query samples:

- Label assignment for each query:

- And Simplex Constraints:

Laplacian-regularized objective:

Page 19: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

19

Laplacian-regularized objective:

Nearest Prototype classification

When

Similar to ProtoNet (Snell ’17) or SimpleShot (Wang ’19)

Laplacian RegularizationWell known in Graph Laplacian:Spectral clustering (Shi I‘00, Von ‘07) , SLK (Ziko ’18)SSL (Weston ‘12, Belkin ‘06)

Page 20: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Takeaways

20

✓ SOTA results without bell and whistles.

✓ Simple constrained graph clustering works very well.

✓ No network fine-tuning, neither meta-learning

✓ Fast transductive inference: almost inductive time

✓ Model Agnostic

Page 21: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

21

LapLacianShot

More Details

Page 22: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

22

Laplacian-regularized objective:

Nearest Prototype classification

- Feature embedding:

- Prototype can be :

- The support example in 1-shot or- Simple mean from support examples or- Weighted mean from both support and initially predicted query samples

When

Labeling according to nearest support prototypes

Page 23: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed LaplacianShot

23

Laplacian-regularized objective:

Laplacian RegularizationWell known in Graph Laplacian:

Encourages nearby points to have similar assignments

Pairwise similarity

Page 24: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

24

Laplacian-regularized objective:

Tricky to optimize due to:

Page 25: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

25

Laplacian-regularized objective:

Tricky to optimize due to:

✖ Simplex/Integer Constraints.

Page 26: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

26

Laplacian-regularized objective:

Tricky to optimize due to:

✖ Laplacian over discrete variables.

Page 27: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

27

Laplacian-regularized objective:

Relax integer constraints:

➢ Convex quadratic problem

✖ Require solving for the N×C variables all together

✖ Extra projection steps for the simplex constraints

Page 28: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Proposed Optimization

28

Laplacian-regularized objective:

We do:

✓ Concave relaxation✓ Independent and closed-form updates for

each assignment variable✓ Efficient bound optimization

Page 29: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

29

Page 30: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

30

=

=

When

Equal

Page 31: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

31

=

When

NotEqual

Page 32: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

32

=

When

NotEqual

Degree

��

��

Page 33: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

33

=

Remove constant terms

Page 34: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave Laplacian

34

Concave for PSD matrix

Page 35: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Concave-Convex relaxation

35

Convex barrier function:

● Avoids extra dual variables for● Closed- form update for the simplex constraint duel

Putting it altogether

Page 36: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Bound optimization

36

First-order approximationof concave term

Fixed unary

Page 37: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Where:

Bound optimization

37

We get Iterative tight upper bound:

Iteratively optimize:

Page 38: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Bound optimization

38

Independent upper bound:

Page 39: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

KKT conditions brings closed form updates:

Bound optimization

39

Minimize Independent upper bound:

Page 40: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Algorithm

40

Page 41: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

41

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Generic Classification

miniImageNet splits: 64 base, 16 validation and 20 test classestieredImageNet splits: 351 base, 97 validation and 160 test classes

Fine-Grained Classification

Splits: 100 base, 50 validation and 50 test classes

Page 42: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

42

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Evaluation protocol:

- 5-way 1-shot/5-shot .- 15 query samples per class

(N=75).- Average accuracy over 10,000

few-shot tasks with 95% confidence interval.

Page 43: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

43

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

- More realistic and challenging- Recently introduced (Wertheimer&

Hariharan, 2019) - Slight class distinction- Imbalanced class distribution with

variable number of supports/query per class

Page 44: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

44

Datasets:

1. Mini-ImageNet

2. Tierd-ImageNet

3. CUB 200-2001

4. Inat

Evaluation protocol:

- 227-way multi-shot .- Top-1 accuracy averaged over

the test images Per Class.- Top-1 accuracy averaged over all

the test images (Mean)

Page 45: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Experiments

45

We do Cross-entropy training with base classes

LaplacianShot during inference

Page 46: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Mini-ImageNet)

46

Page 47: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Mini-ImageNet)

47

Page 48: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (Tiered-ImageNet)

48

Page 49: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (CUB)

49

Cross Domain

Page 50: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Results (iNat)

50

Page 51: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Choosing

51

Page 52: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Convergence

52

Page 53: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Ablation: Average Inference time

53

Transductive

Page 54: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

LaplacianShot Takeaways

54

✓ SOTA results without bell and whistles.

✓ Simple constrained graph clustering works very well.

✓ No network fine-tuning, neither meta-learning

✓ Fast transductive inference: almost inductive time

✓ Model Agnostic: during inference with any training model and gain up to 4/5%!!!

Page 55: Laplacian Regularized Few Shot Learning Imtiaz Masud Ziko ...no...SOTA results without bell and whistles. Simple constrained graph clustering works very well. No network fine-tuning,

Thank you

Code On:https://github.com/imtiazziko/LaplacianShot

55