Graph Regularised Hashing

1
G RAPH R EGULARISED H ASHING S EAN M ORAN ,V ICTOR L AVRENKO SEAN . MORAN @ ED . AC . UK R ESEARCH Q UESTION Locality sensitive hashing (LSH) [1] fractures the input feature space space with randomly placed hyperplanes. Can we do better by using supervision to adjust the hyperplanes? I NTRODUCTION Problem: Constant time nearest-neighbour search in large datasets. Hashing-based approximate nearest neighbour (NN) search: Index image/documents into the buckets of hashtable(s) Encourage collisions between similar images/documents 110101 010111 111101 H H Content Based IR Image: Imense Ltd Image: Doersch et al. Image: Xu et al. Location Recognition Near duplicate detection 010101 111101 ..... H QUERY DATABASE QUERY NEAREST NEIGHBOURS HASH TABLE COMPUTE SIMILARITY Advantages: O(1) lookup per query rather than O(N) (brute-force) Memory/storage saving due to compact binary codes G RAPH R EGULARISED H ASHING (GRH) We propose a two step iterative hashing model, Graph Regularised Hashing (GRH) [5]. GRH uses supervision in the form of an adja- cency matrix that specifies whether or not data-points are related. – Step A: Graph Regularisation: the K-bit hashcode of a data- point is set to the average of the data-points of its nearest neighbours as specified by the adjacency graph: L m sgn ( α SD -1 L m-1 + (1-α)L 0 ) * S: Affinity (adjacency) matrix * D: Diagonal degree matrix * L: Binary bits at iteration m * α ∈{0, 1}: Linear interpolation parameter Step A is a simple sparse-sparse matrix multiplication, and can be implemented very efficiently. Any existing hash function e.g. LSH [1] can be used to initialise the bits in L 0 – Step B: Data-Space Partitioning: the hashcodes produced in Step A are used as the labels to learn K binary classifiers. This is the out-of-sample extension step, allowing the encoding of data-points not seen before: for k =1. . .K : min ||w k || 2 + C N trd i=1 ξ ik s.t. L ik (w k | x i + t k ) 1 - ξ ik for i =1. . .N trd * w k : Hyperplane k t k : bias k * x i : data-point i L ik : bit k of data-point i * ξ ik : slack variable K : # bits N trd : # data-points Steps A-B are repeated for a set number of iterations (M) (e.g. < 10). The learnt hyperplanes w k can then be used to encode unseen data- points (via a simple dot-product). S TEP A: G RAPH R EGULARISATION Toy example: nodes are images with 3-bit LSH encoding. Arcs indicate nearest neighbour relationships. We show two images (c,e) having their hashcodes updated in Step A: -1 1 1 1 1 1 -1 -1 -1 1 1 -1 b a c e f g h d -1 1 1 1 -1 -1 1 -1 -1 1 1 1 -1 1 1 1 1 -1 S TEP B: D ATA -S PACE PARTITIONING Here we show a hyperplane being learnt using the first bit as (highlighted with bold box) as label. One hyperplane is learnt per bit. -1 1 1 1 1 1 -1 -1 -1 1 1 -1 b a c e f g h d -1 1 1 w 1 . x t 1 = 0 w 1 Negative (-1) half space Positive (+1) half space 1 1 -1 1 -1 -1 -1 1 1 Q UANTITATIVE R ESULTS (CIFAR-10) (M ORE DATASETS IN PAPER ) Mean average precision (mAP) image retrieval results using GIST features on CIFAR-10 (NN: is significant at p< 0.01): CIFAR-10 16 bits 32 bits 48 bits 64 bits ITQ+CCA [2] 0.2015 0.2130 0.2208 0.2237 STH [3] 0.2352 0.2072 0.2118 0.2000 KSH [4] 0.2496 0.2785 0.2849 0.2905 GRH [5] 0.2991NN 0.3122NN 0.3252NN 0.3350NN Timings (seconds) averaged over 10 runs. GRH is 1) faster to train and 2) is faster to encode unseen data-points: Training Testing Total GRH [5] 8.01 0.03 8.04 KSH [4] 74.02 0.10 74.12 BRE [6] 227.84 0.37 228.21 S UMMARY OF KEY FINDINGS First both accurate and scalable supervised hashing model Future work will extend GRH to streaming data sources Code online: https://github.com/sjmoran/grh References: [1] P. Indyk, R. Motwani: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC (1998). [2] Y. Gong, S. Lazebnik: Iterative Quantisation. In: CVPR (2011). , [3] D. Zhang et al. Self-Taught Hashing. In: SIGIR (2010)., [4] W. Liu et al. Supervised Hashing with Kernels. In: CVPR (2012)., [5] S. Moran, V. Lavrenko. Graph Regularised Hashing. In: ECIR (2015)., [6] B. Kulis, T. Darrell et al. Binary Reconstructive Embedding. In: NIPS (2009).

Transcript of Graph Regularised Hashing

Page 1: Graph Regularised Hashing

GRAPH REGULARISED HASHINGSEAN MORAN†, VICTOR LAVRENKO† [email protected]

RESEARCH QUESTION

• Locality sensitive hashing (LSH) [1] fractures the input featurespace space with randomly placed hyperplanes.

• Can we do better by using supervision to adjust the hyperplanes?

INTRODUCTION

• Problem: Constant time nearest-neighbour search in large datasets.• Hashing-based approximate nearest neighbour (NN) search:

– Index image/documents into the buckets of hashtable(s)– Encourage collisions between similar images/documents

110101

010111

111101

H

H

Content Based IR

Image: Imense Ltd

Image: Doersch et al.

Image: Xu et al.

Location Recognition

Near duplicate detection

010101

111101

.....

H

QUERY

DATABASE

QUERY

NEARESTNEIGHBOURS

HASH TABLE

COMPUTE SIMILARITY

• Advantages:– O(1) lookup per query rather than O(N) (brute-force)– Memory/storage saving due to compact binary codes

GRAPH REGULARISED HASHING (GRH)• We propose a two step iterative hashing model, Graph Regularised

Hashing (GRH) [5]. GRH uses supervision in the form of an adja-cency matrix that specifies whether or not data-points are related.

– Step A: Graph Regularisation: the K-bit hashcode of a data-point is set to the average of the data-points of its nearestneighbours as specified by the adjacency graph:

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)∗ S: Affinity (adjacency) matrix∗ D: Diagonal degree matrix∗ L: Binary bits at iteration m∗ α ∈ {0, 1}: Linear interpolation parameter

– Step A is a simple sparse-sparse matrix multiplication, and canbe implemented very efficiently. Any existing hash functione.g. LSH [1] can be used to initialise the bits in L0

– Step B: Data-Space Partitioning: the hashcodes produced inStep A are used as the labels to learn K binary classifiers. Thisis the out-of-sample extension step, allowing the encoding ofdata-points not seen before:

for k = 1. . .K : min ||wk||2 + C∑Ntrd

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .Ntrd

∗ wk: Hyperplane k tk: bias k∗ xi: data-point i Lik: bit k of data-point i∗ ξik: slack variable K: # bits Ntrd: # data-points

• Steps A-B are repeated for a set number of iterations (M) (e.g. < 10).The learnt hyperplanes wk can then be used to encode unseen data-points (via a simple dot-product).

STEP A: GRAPH REGULARISATION

• Toy example: nodes are images with 3-bit LSH encoding. Arcsindicate nearest neighbour relationships. We show two images(c,e) having their hashcodes updated in Step A:

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 -1 -1

1 -1 -1

1 1 1 -1 1 1

1 1 -1

STEP B: DATA-SPACE PARTITIONING

• Here we show a hyperplane being learnt using the first bit as(highlighted with bold box) as label. One hyperplane is learntper bit.

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1w1. x−t 1=0

w1

Negative (-1)half space

Positive (+1)half space

1 1 -1

1 -1 -1

-1 1 1

QUANTITATIVE RESULTS (CIFAR-10) (MORE DATASETS IN PAPER)

• Mean average precision (mAP) image retrieval results usingGIST features on CIFAR-10 (NN: is significant at p < 0.01):

CIFAR-1016 bits 32 bits 48 bits 64 bits

ITQ+CCA [2] 0.2015 0.2130 0.2208 0.2237STH [3] 0.2352 0.2072 0.2118 0.2000KSH [4] 0.2496 0.2785 0.2849 0.2905GRH [5] 0.2991NN 0.3122NN 0.3252NN 0.3350NN

• Timings (seconds) averaged over 10 runs. GRH is 1) faster totrain and 2) is faster to encode unseen data-points:

Training Testing TotalGRH [5] 8.01 0.03 8.04KSH [4] 74.02 0.10 74.12BRE [6] 227.84 0.37 228.21

SUMMARY OF KEY FINDINGS

• First both accurate and scalable supervised hashing model• Future work will extend GRH to streaming data sources• Code online: https://github.com/sjmoran/grhReferences:

[1] P. Indyk, R. Motwani: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC (1998).

[2] Y. Gong, S. Lazebnik: Iterative Quantisation. In: CVPR (2011). , [3] D. Zhang et al. Self-Taught Hashing. In: SIGIR (2010)., [4] W. Liu et

al. Supervised Hashing with Kernels. In: CVPR (2012)., [5] S. Moran, V. Lavrenko. Graph Regularised Hashing. In: ECIR (2015).,

[6] B. Kulis, T. Darrell et al. Binary Reconstructive Embedding. In: NIPS (2009).