Large-Scale Image Retrieval with Attentive Deep Local Features

2021.04.27Sebin Lee

ICCV 2017

Contents

• Background & Motivation

• Our Approach

• Results

• Summary

• Quiz

Recap: Keypoint

3picture from Sung-eui Yoon slide, originally made by Kristen Grauman and David Lowe

• Important point of image (e.g., corner)• The keypoint itself is not useful for image retrieval.• e.g., Harris corner detector, Local maxima of DoG

Background & Motivation

< Harris corner detector >

< Local maxima of DoG >

Recap: Local Descriptor

• The local descriptor is used because the keypoint itself is not useful.• The local descriptor means a compact representation of the image

patch centered on the extracted keypoint.• e.g., SIFT

< Gradients of image patch > < SIFT descriptor >

Classical Keypoint & Local Descriptor Properties

5picture from Sung-eui Yoon slide, originally made by Denis Simakov

• Repeatable• Distinctive• Invariant• Adequate Quantity

How about deep feature?

Sparse keypoint & descriptor

< Keypoints of Image>

Deep Feature

• CNNs generate uniformly dense grid feature map.• We can regard the dense grid feature map as a grid of local descriptors.

localdescriptor

RGB Image(3, H, W)

Dense grid feature(C, H', W')

CNN layers

receptive field of local descriptor= local image patch

= Densely extracted local descriptors

Motivation

7picture from Sung-eui Yoon slide, originally made by Denis Simakov

• Unlike classical methods, the CNNs generate uniformly dense grid local descriptors.• Therefore, local descriptors are extracted from regions that have no value as keypoints.

(e.g., texture-less region)• Many local descriptors containing unnecessary ones hinder search and making codebook.

Query DB

< Query feature contains unnecessary local descriptors (people) >

∴We needs keypoint selection that selects only helpful keypointsfor efficient and accurate image retrieval.

Contents

• Our Approach

• Results

• Summary

• Quiz

• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.

Our Approach

Local descriptor network

< Overall Pipeline of Image Retrieval >

Our goal

Our Approach

< Overall Pipeline of Image Retreival >

Our goal

Our Approach

< Overall Pipeline of Image Retreival >

Our goal

Additionallayers only for

training

Train Local Descriptor Network by using Classification task

Our Approach

Input image

Grid local descriptors

LabelC.ELoss

• Backbone: ResNet50 trained on ImageNet• Fine tune the local descriptor network on Google Landmarks dataset.

(Image Retrieval dataset)

• Use only cross-entropy loss (loss of classification task)• However, this method doesn’t perform keypoint selection.

Local Descriptor Network

Training for Keypoint Selection

Our Approach

Input image

Attention score (channel: 1)

Sparse local descriptors by keypoint selection

• Keypoint selection can be performed by attention.• Attention module is used to calculate attention score.• Attention score is close to 0: no keypoint• Attention score is close to 1: keypoint

LabelC.ELoss

Local Descriptor Network

∴We can train the local descriptor network performing keypoint selection by end-to-end manner.

Additionallayers only for

training

Comparison with classical local descriptor

• Classical Local descriptor① keypoint selection② local descriptor extraction

• Ours① local descriptor extraction② keypoint selection

• The order of process is different, but the results are similar.

Our Approach

Image retrieval pipeline with ours

• Overall image retrieval pipeline with our local descriptor network

Our Approach

Sparse local descriptors by keypoint selection

Contents

• Our Approach

• Results

• Summary

• Quiz

Precision & Recall Result

Results

• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention for keypoint selection• QE: Use Query Expansion

Absolute Recall(#true positives w.r.t all queries)

PR curve @ Google Landmarks dataset

Our full model

Our Backbone

mAP Result

Results

Mean average precision: mAP(%)

• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention• QE: Use Query Expansion• DIR: Use global descriptor

Keypoint Selection Result(Attention Result)

Results

DELF DELF+FT DELF+FT+ATT

DELF DELF+FT+ATT

DELF+FT

• Keypoint selection effectively disregards clutter.• Attention activates on important pixels for image retrieval.

DELF: Backbone trained on ImageNetFT: Fine Tune with Google LandmarksATT: Use Attention

Contents

• Our Approach

• Results

• Summary

• Quiz

Contributions

Summary

• Keypoint selection using the attention module• Unnecessary region descriptors are suppressed. (e.g., texture-less region) • Only sparse local descriptors that are useful for image retrieval are extracted.∴Search efficiency ↑, Accuracy ↑

< Result of Keypoint Selection>

Strengths & Weaknesses

• Strengths• Proposed method can extract both local descriptors and keypoints via one forward pass.• Efficient and accurate image retrieval can be performed by keypoint selection using attention.

• Weaknesses• This paper trains the descriptor network using only classification task.

(Do not use other metric learning methods)• The image label is required to train the network.

Summary

Contents

• Background

• Related work

• Our Approach

• Results

• Quiz

• Please submit this google form.

Link will be posted in the regular zoom meeting session.

THANK YOU

Large-Scale Image Retrieval with Attentive Deep Local Features

Documents

Transcript of Large-Scale Image Retrieval with Attentive Deep Local Features

Learning to Match Aerial Images with Deep Attentive ... · Learning to Match Aerial Images with Deep Attentive Architectures Hani Altwaijry1,2, Eduard Trulls3, James Hays4, Pascal

Deep Learning for Information Retrieval - Hang Li Outline •Introduction to Huawei Noah’s Ark Lab •Deep Learning – New Opportunities for Information Retrieval •Three Useful

Medical Image Retrieval using Deep Convolutional Neural ...

Person Retrieval in Video Surveillance Using Deep Learning ...

Deep Memory Network for Cross-Modal Retrieval

Large-Scale Image Retrieval with Attentive Deep Local … · Large-Scale Image Retrieval with Attentive Deep Local Features Hyeonwoo Nohy Andre Araujo´ Jack Sim Tobias Weyand Bohyung

Effective Deep Learning Based Multi-Modal Retrieval

Large-Scale Image Retrieval with Attentive Deep Local FeaturesInternational Conference on Computer Vision 2017 Large-Scale Image Retrieval with Attentive Deep Local Features 1POSTECH,

Deep learning music information retrieval for content ...

cvteam.netcvteam.net/papers/2020-JSTSP-Attentive Deep Stitching and Quality... · 1 Attentive Deep Stitching and Quality Assessment for 360 Omnidirectional Images 1 2 3 Jia Li , Senior

Deep Multi-Graph Clustering via Attentive Cross-Graph ...personal.psu.edu/dul262/DMGC/dmgc_poster.pdf · Deep Multi-Graph Clustering via Attentive Cross-Graph Association Dongsheng

Disaster Impact Information Retrieval Using Deep Learning ...

Pycon2016- Applying Deep Learning in Information Retrieval System

Robust Attentive Deep Neural Network for Exposing GAN ...

Near-Duplicate Video Retrieval with Deep Metric Learning

Scalable Deep Multimodal Learning for Cross-Modal Retrieval · Scalable Deep Multimodal Learning for Cross-Modal Retrieval Peng Hu∗ Machine Intelligence Laboratory College of Computer

Near-Duplicate Video Retrieval With Deep Metric Learningopenaccess.thecvf.com/.../papers/w5/Kordopatis-Zilos_Near-Duplicate... · Near-Duplicate Video Retrieval with Deep Metric Learning

Exploring deep learning for intelligent image retrieval

DRAW: Deep Recurrent Attentive Writer

New Oracle Character Image Retrieval by Combining Deep Neural … · 2020. 5. 28. · Oracle Character Image Retrieval by Combining Deep Neural Networks and Clustering Technology