HardNet: Convolutional Network for Local Image Description

29
HARDNET: CONVOLUTIONAL NETWORK FOR LOCAL IMAGE DESCRIPTION Anastasiia Mishchuk, Dmytro Mishkin, Filip Radenovic Jiri Matas

Transcript of HardNet: Convolutional Network for Local Image Description

Page 1: HardNet: Convolutional Network for Local Image Description

HARDNET: CONVOLUTIONAL NETWORK FOR LOCAL IMAGE DESCRIPTION

Anastasiia Mishchuk,Dmytro Mishkin,Filip RadenovicJiri Matas

Page 2: HardNet: Convolutional Network for Local Image Description

Short review of methods for learning of local descriptors

The L2­NetHardNet loss and architectureBenchmarks

2

OUTLINE

Page 3: HardNet: Convolutional Network for Local Image Description

3

TRAINING DATA

Discriminant Learning of Local Image DescriptorsBrown et al, PAMI2010

3 sets, 400k patches each: • Liberty (shown)• Notredame• Yosemite

Size: 64x64, grayscale.Obtained from SfM model, 3D point → DoG keypoints

Used in all learned descriptors meantionedin this presentation

Page 4: HardNet: Convolutional Network for Local Image Description

4

CONVEXOPT (SIMONYAN ET AL, 2012)

Global margin loss

Simonyan et al, ECCV 2012

Convex optimization problem

Page 5: HardNet: Convolutional Network for Local Image Description

5

MATCHNET

Han et al, CVPR2015.

Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly

7x7 Convpad 1

64

24ReLU

1 24

5x5 Convpad 2

64ReLU

64

3x3 Convpad 1

ReLU

642x2 MP/2

32 322x2 MP/2

16

96

16

3x3 Convpad 1

ReLU96

163x3 Conv

pad 1

ReLU64

163x3 MP/2

64

88x8 Conv

ReLU

1

128

1x1 Conv

ReLU

1

256

1x1 Conv

ReLU

1

256

1x1 Conv

Softmax

1

2

Page 6: HardNet: Convolutional Network for Local Image Description

6

DEEPCOMPARE

Zagoruyko and Komodakis, CVPR 2015

Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly

7x7 Convpad 3

64

96ReLU

1 96

5x5 Convpad 2

192ReLU

192

3x3 Convpad 1

ReLU

642x2 MP/2

32 322x2 MP/2

16

256

16

8

8x8 Conv

ReLU

1 1

256

1x1 Conv

ReLU256

1x1 Conv

Sigmoid 1

2x2 MP/2

256

Page 7: HardNet: Convolutional Network for Local Image Description

9

Simo-Serra et al, ICCV 2015. Balntas et al, BMVC 201632

327x7 Conv

26

TanH1

2x2 MP/213

6x6 Conv

TanH

8

64

8x8 Conv

TanH

1

12832

TFeat (Balntas et al, 2016) • Even shallower and faster CNN,• hard-negative mining:

by anchor swap in triplet.• triplet margin loss on L2 distance

1

647x7 Conv

58

32TanH

2x2 L2pool/2

29 6x6 Conv

TanH

23

64

5x5 Conv

TanH

4

12832

3x3L2Pool/3 8

64

4x4L2Pool/4

1

128

DeepDesc (Simo-Serra et al, 2015 )Relatively shallow and fast CNN. Hard negative mining:Contrastive loss on L2 distance

Page 8: HardNet: Convolutional Network for Local Image Description

10

DESCRIPTOR COMPARISONDescr. #layers

w/paramsLoss Hard mining Kd-tree

readyConvexOpt 1 Global margin - +

DeepDesc 3 Contrastive + +

TFeat 3 Triplet margin +/- +

MatchNet 8 Cross entropy - -

DeepComp 5 Hinge - -

Balntas et al, BMVC 2016

Page 9: HardNet: Convolutional Network for Local Image Description

11

L2NET. TIAN ET AL (CVPR 2017)32 32 16 16

3x3 Convpad 1

32

32BN + ReLU

1

3x3 Convpad 1

32BN + ReLU

3x3 Convpad 1 /2

64BN + ReLU

3x3 Convpad 1

64BN + ReLU

3x3 Convpad 1 /2

BN + ReLU

8

128

3x3 Convpad 1

BN + ReLU

8

128

8x8 Conv

BN+ L2Norm

1

128

 

Page 10: HardNet: Convolutional Network for Local Image Description

13

L2NET: LOSS TERMS

Softmax over row/column of distance matrix

Page 11: HardNet: Convolutional Network for Local Image Description

14

L2NET: LOSS TERMS

Softmax over row/column of distance matrix

Penalty on descriptor components correlation

Page 12: HardNet: Convolutional Network for Local Image Description

15

L2NET: LOSS TERMS

Softmax over row/column of distance matrix

Softmax over row/column of distance matrix of intermediate features

Penalty on descriptor components correlation

Page 13: HardNet: Convolutional Network for Local Image Description

16

HARDNET

Triplet margin loss for hard negative

Penalty on descriptor channels correlation

Softmax over row/column of distance matrix of intermediate features

 

Page 14: HardNet: Convolutional Network for Local Image Description

17

HARDNET (OURS)3x3 Conv

pad 1

32

32BN + ReLU

1

3x3 Convpad 1

32BN + ReLU

3x3 Convpad 1 /2

64BN + ReLU

3x3 Convpad 1

64BN + ReLU

3x3 Convpad 1 /2

BN + ReLU

8

128

3x3 Convpad 1

BN + ReLU

8

128

8x8 Conv

BN+ L2Norm

1

128

Page 15: HardNet: Convolutional Network for Local Image Description

18

BATCH SIZE INFLUENCE

Page 16: HardNet: Convolutional Network for Local Image Description

19

DESCRIPTOR COMPARISON

Descr. #layersw/params

Loss Hard mining Kd-tree ready

ConvexOpt 1 Global margin - +

DeepDesc 3 Contrastive + +

TFeat 3 Triplet margin +/- +

MatchNet 8 Cross entropy - -

DeepComp 5 Hinge - -

L2Net 7 SoftMax + +

HardNet 7 Triplet margin + +

Page 17: HardNet: Convolutional Network for Local Image Description

Loss comparison on patch triplets

20

Page 18: HardNet: Convolutional Network for Local Image Description

21

LOSSES COMPARISON, DERIVATIVES

   

 

Page 19: HardNet: Convolutional Network for Local Image Description

22

LOSSES COMPARISON, DERIVATIVES

   

 No gradient

from negative exampleSmall gradients

Page 20: HardNet: Convolutional Network for Local Image Description

23

LOSSES COMPARISON

Contrastive Softmax (L2Net) Triplet margin

FPR, Brown Yos

0.009 0.009 0.006

mAUC, W1BS 0.072 0.083 0.083

mAUC, HP-T 0.153 0.157 0.164

Page 21: HardNet: Convolutional Network for Local Image Description

Results 

24

Page 22: HardNet: Convolutional Network for Local Image Description

25

RESULTS: BROWN DATASET

Page 23: HardNet: Convolutional Network for Local Image Description

26

RESULTS: W1BS DATASET

Mishkin et al, BMVC 2015

Nuisance factor: Appearance Geometry Lighting Sensor

Page 24: HardNet: Convolutional Network for Local Image Description

27

HPATCHES DATASETDoG, Hessian, Harris – in ref.image~1300 patches per image kept.Reprojected to other images with3 levels of “affine frame noise” added

V: 57 image sixplets – photometric changesI: 59 image sixplets – geometric changes

Balntas et al, CVPR 2017

Page 25: HardNet: Convolutional Network for Local Image Description

28

RESULTS: HPATCHES

Page 26: HardNet: Convolutional Network for Local Image Description

29

RESULTS: MATCHING WITH VIEW SYNTH

Datasets are already saturated On par withRootSIFT

Still challenging due to multiple nuisance factors

Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014

Page 27: HardNet: Convolutional Network for Local Image Description

30

RESULTS: BOW OXFORD5K & PARIS 6K

Philbin et al 2007, Philbin et al 2008

Page 28: HardNet: Convolutional Network for Local Image Description

31

RESULTS: HQE OXFORD5K & PARIS 6K

Page 29: HardNet: Convolutional Network for Local Image Description

Thank youfor attention

PDF:                         https://arxiv.org/abs/1705.10872Source and models: https://github.com/DagnyT/hardnet 

32