Secure and Efficient Image Retrieval over Encrypted Cloud...

15
Research Article Secure and Efficient Image Retrieval over Encrypted Cloud Data Haihua Liang , 1,2 Xinpeng Zhang, 1,3 Hang Cheng, 4 and Qiuhan Wei 1 1 School of Communication and Information Engineering, Shanghai University, Shanghai, China 2 Department of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China 3 Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China 4 College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China Correspondence should be addressed to Haihua Liang; [email protected] Received 28 September 2017; Revised 7 April 2018; Accepted 6 June 2018; Published 8 July 2018 Academic Editor: Zheng Yan Copyright © 2018 Haihua Liang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper proposes a novel image retrieval scheme over encrypted cloud data, which achieves high efficiency and confidentiality. For the purpose of improving search efficiency, an index tree is oſten deployed in the image retrieval scheme. Meanwhile, the confidentiality of the sensitive cloud data, such as outsourced images, index tree, and query request, is also a key issue. Firstly, a balanced binary clustering algorithm is exploited over the integrated image features composed of basic features, such as HSV histogram and DCT histogram, yielding a balanced binary tree (BBT). In particular, due to the adoption of a balanced index tree, our scheme can achieve logarithmic search time. Secondly, the secure inner product is employed to encrypt the index vector and query feature. Finally, to resist the statistical attack of the frequency distribution of the retrieved results, we copy the database and merge the subtree of encrypted BBT to blind the search results. Security analysis and experimental results show that the proposed scheme is secure and efficient. 1. Introduction With the arrival of the cloud computing, secure image retrieval on the cloud has attracted more and more attention in recent years [1]. Due to high convenience and economic savings, both enterprises and individuals tend to employ the cloud to store and manage their sensitive data, such as photo albums and personal health documents. To ensure the confidentiality of outsourced data, the common approach is to encrypt the data before it is outsourced to the cloud. However, the traditional encryption may make the basic data operations unfeasible, for example, the information retrieval of encrypted data. In the ciphertext scenario, how to achieve an efficient retrieval while protecting the privacy of customers becomes a challenging problem. At present, there are three main issues that restrict the development of information retrieval in the encrypted domain. e first issue is to realize searchable functional- ity on encrypted data and achieve the same precision as plaintext data. Certainly, a naive approach is to download all the ciphertext, decrypt them, and search locally in the plaintext. However, it will incur heavy cost of bandwidth and computation. To address this issue, cryptographic tech- niques, such as homomorphic encryption [2] and multiparty computation, can be employed to encrypt the plaintext data and support search operation in the ciphertext. However, the above methods are concerned more with data confidentiality than retrieval efficiency, and the cost is expensive in practi- cal applications. In the contrary, some efficient techniques, such as order-preserving encryption (OPE) [3, 4], random- ized hash functions [5–7], and asymmetric scalar-product- preserving encryption (ASPE) [8], are widely adopted. e reason is that they take both the data confidentiality and retrieval efficiency into account. e second issue is that although plaintext content of the encrypted data is not leaked in the above schemes, some statistical information, such as the request frequency of encrypted query (i.e., query pattern) or the access frequency of encrypted result (i.e., Hindawi Security and Communication Networks Volume 2018, Article ID 7915393, 14 pages https://doi.org/10.1155/2018/7915393

Transcript of Secure and Efficient Image Retrieval over Encrypted Cloud...

Page 1: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Research ArticleSecure and Efficient Image Retrieval over Encrypted Cloud Data

Haihua Liang 12 Xinpeng Zhang13 Hang Cheng4 and Qiuhan Wei1

1 School of Communication and Information Engineering Shanghai University Shanghai China2Department of Computer Science and Engineering Changshu Institute of Technology Suzhou China3Key Laboratory of Specialty Fiber Optics and Optical Access Networks Joint International Research Laboratory ofSpecialty Fiber Optics and Advanced Communication Shanghai Institute for Advanced Communication and Data ScienceShanghai University Shanghai China

4College of Mathematics and Computer Science Fuzhou University Fuzhou China

Correspondence should be addressed to Haihua Liang lhhcslgeducn

Received 28 September 2017 Revised 7 April 2018 Accepted 6 June 2018 Published 8 July 2018

Academic Editor Zheng Yan

Copyright copy 2018 Haihua Liang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This paper proposes a novel image retrieval scheme over encrypted cloud data which achieves high efficiency and confidentialityFor the purpose of improving search efficiency an index tree is often deployed in the image retrieval scheme Meanwhile theconfidentiality of the sensitive cloud data such as outsourced images index tree and query request is also a key issue Firstlya balanced binary clustering algorithm is exploited over the integrated image features composed of basic features such as HSVhistogram and DCT histogram yielding a balanced binary tree (BBT) In particular due to the adoption of a balanced index treeour scheme can achieve logarithmic search time Secondly the secure inner product is employed to encrypt the index vector andquery feature Finally to resist the statistical attack of the frequency distribution of the retrieved results we copy the database andmerge the subtree of encrypted BBT to blind the search results Security analysis and experimental results show that the proposedscheme is secure and efficient

1 Introduction

With the arrival of the cloud computing secure imageretrieval on the cloud has attracted more and more attentionin recent years [1] Due to high convenience and economicsavings both enterprises and individuals tend to employthe cloud to store and manage their sensitive data such asphoto albums and personal health documents To ensure theconfidentiality of outsourced data the common approachis to encrypt the data before it is outsourced to the cloudHowever the traditional encryption maymake the basic dataoperations unfeasible for example the information retrievalof encrypted data In the ciphertext scenario how to achievean efficient retrieval while protecting the privacy of customersbecomes a challenging problem

At present there are three main issues that restrictthe development of information retrieval in the encrypteddomain The first issue is to realize searchable functional-ity on encrypted data and achieve the same precision as

plaintext data Certainly a naive approach is to downloadall the ciphertext decrypt them and search locally in theplaintext However it will incur heavy cost of bandwidthand computation To address this issue cryptographic tech-niques such as homomorphic encryption [2] and multipartycomputation can be employed to encrypt the plaintext dataand support search operation in the ciphertext However theabove methods are concerned more with data confidentialitythan retrieval efficiency and the cost is expensive in practi-cal applications In the contrary some efficient techniquessuch as order-preserving encryption (OPE) [3 4] random-ized hash functions [5ndash7] and asymmetric scalar-product-preserving encryption (ASPE) [8] are widely adopted Thereason is that they take both the data confidentiality andretrieval efficiency into account The second issue is thatalthough plaintext content of the encrypted data is not leakedin the above schemes some statistical information suchas the request frequency of encrypted query (ie querypattern) or the access frequency of encrypted result (ie

HindawiSecurity and Communication NetworksVolume 2018 Article ID 7915393 14 pageshttpsdoiorg10115520187915393

2 Security and Communication Networks

access pattern) may leak the privacy of query user ObliviousRAMs [9] is a solution to protect the access pattern butnot practical enough The third issue is that a retrievalscheme of linear efficiency is not desirable because the searchtime will increase as the dataset becomes larger In factsecure information retrieval is commonly used for images ordocuments stored in a cloud serverWedetail themas follows

On the one hand abundant works have been put forwardto achieve secure retrieval in the encrypted documents Forexample Boolean search based on the single keyword isseparately proposed in symmetric key setting [11] and publickey setting [12] Since similarity search is more practical thanBoolean search multi-keyword ranked search [13] is studiedto enrich search functionality and improve result accuracywhere each document is associated with an index vectorEach element of the vector indicates whether a keyword existsor represents its ldquoterm frequency (TF) times inverse documentfrequency (IDF)rdquo Then the k-nearest neighbor is found bycomparing the cosine similarity between the query vectorand all index vectors which is linear efficiency In orderto improve retrieval efficiency a few works based on indextree are proposed For example Sun et al [14] present atree-based search scheme that constructs index vectors ofall documents as an MDB-tree It achieves sublinear searchefficiency via setting the prediction threshold for each levelof the index tree Although a tighter prediction value canobtain logarithmic search efficiency the result accuracy issacrificed at the same time Moreover Xia et al [15] builda KBB-tree from a bottom-up approach In KBB-tree theelement of internal node vector is the maximum value of thecorresponding position of its child node vectors A ldquoGreedy-Depth-First Searchrdquo algorithm is executed to find k mostrelevant leaf nodes which are stored in an RList If thecorrelation score between the query vector and the internalnode vector is smaller than the minimum score in the RListthe subtree of the internal node does not need to be searchedThus this scheme can also achieve sublinear efficiency

On the other hand some works have been proposed forencrypted image retrieval In [16] a privacy-enhanced facerecognition is realized with a help of Paillier homomorphicencryption (HE) The drawback is that the adoption of HEincurs heavy cost of computation and communication Tobe practical Lu et al propose a secure content-based imageretrieval (CBIR) scheme based on featureindex randomiza-tion [17] or min-Hash [5] Meanwhile the performance com-parison between homomorphic encryption and distance-preserving randomization is studied in [18] Due to the one-way and binary property of hash code secure CBIR thatexploits the hash function to encrypt features is effective andefficient in the large-scale database [6] However the accesspattern is leaked in the above schemes To solve this issueWeng et al [7] omit certain bits of the hash code of queryimage As a result the cloud returns all possible candidates tothe user Therefore the query pattern and access pattern areprotected But the user is involved to compare the features ofcandidates and obtain an accurate result Also it is difficult togenerate hash codes that uniformly distributed in the featurespace In further under the vector space model there is onlya handful of works that support efficient index structure

For example Xia et al [19] use local-sensitive hash (LSH) toconstruct a prefilter table but a refinement of the candidateresults is also a linear comparison Thus it just achievessublinear search efficiency Yuan et al [20] employ k-means tobuild an index tree Since k-means is not a balanced clusteringalgorithm it is inevitable to generate an index tree of skewedhierarchies Subsequently due to uneven depth in differentparts of index tree the search efficiency tends to be sublinearIn short under the vector space model the requirementfor secure and efficient image retrieval mechanisms remainsopen up to date

Besides the above schemes that based on the encryptedfeature the secure retrieval scheme directly based onencrypted images has been proposed It is designed for thewidely used JPEG image For example Zhang and Chengfirst use DC histogram preserved in the encrypted image as afeature [21] and then exploit AC histogram [22] and a newblock feature [23] for retrieval However the efficiency oflinear comparison or block-wise comparison is unacceptableTherefore the construction of secure index tree is necessary

In this paper we propose an efficient image retrievalschemeover encrypted cloud data which based on a balancedindex tree ie balanced binary tree (BBT) We introducea novel balanced binary clustering (BBC) algorithm anduse integrated features to build the BBT Owing to the factthat this paper realizes similarity search we tend to utilizeglobal feature (such as color shape) instead of accuratelocal feature (such as SIFT BRIEF) Hence our integratedfeature is composed of widely used HSV histogram andDCT histogram [5 21 22 24] We focus on the constructionof index tree and protection of query pattern and accesspattern As for the image encryption we adopt partial imageencryption technique [23 25 26] for JPEG format imageand mature cryptographic ciphers (such as AES) for theother format images Note that the key distribution is anindependent issue and out of the scope of this paper Ourmain contributions lie in three aspects

(1) We combine HSV histogram in the color space andDCT histogram in the transform domain as an integratedfeature To improve the search efficiency we construct a BBTfor the image database The BBT is built from a top-downapproach Since the hierarchical index tree is approximatelybalanced the scheme can achieve logarithmic search time

(2) By incorporating secure inner product (ie ASPE)the BBT is encrypted as secure BBT (SBBT) by a secretkey and query feature is encrypted The database image isencrypted by another secret key After that the sensitivedata in the cloud is protected meanwhile the searchablefunctionality of the encrypted feature is still valid Furtherthe random number extended in the query feature can blindthe query pattern

(3) To blind the search result we copy the database andmerge the subtree to rebuild SBBT The reduced SBBT andthe weighted factor of the integrated feature make the searchresult of one query image not unique Therefore the queryprivacy about access pattern is protected

The rest of this paper is organized as follows In Section 2we introduce system model threat model and design goalsAlso commonnotations are defined In Section 3 we propose

Security and Communication Networks 3

data owner authorized user

authorized user

encryptedqueryencrypted

index tree

encryptedimages

return

(access control key distribution)query encryption key amp image decryption keys

semi-trustedcloud server

Figure 1 A secure and efficient image retrieval scheme over encrypted data

BBC algorithm use BBC to build a BBT and adopt ASPE toencrypted BBT Section 4 presents and discusses experimen-tal results Finally we conclude the paper in Section 5

2 Problem Formulation

21 SystemModel The system contains three parties the dataowner the cloud server and authorized users as illustrated inFigure 1

The owner first extracts feature vectors such as HSVhistogram and DCT histogram from each image in hisdatabase They are concatenated into an integrated featureAfter the feature extraction the plaintext image is encryptedas the ciphertext by a secret key For improving searchefficiency the owner builds an index tree based on theintegrated features Then he utilizes another secret key toencrypt the index tree The leaf node of the secure index treeis associated with the encrypted image Afterward the owneroutsources the encrypted database images and the secureindex tree to the cloud In addition the owner sends secretkeys to each authorized user through a secure channel

The authorized user extracts feature vectors from thequery image To fetch encrypted images in the secure indextree he encrypts the integrated feature of a query image andsubmits it to the cloud

The cloud stores the encrypted database images andsecure index tree from the owner After receiving anencrypted query feature the cloud executes the search opera-tion in the secure index tree After that he returns encryptedimages that associated with retrieved leaf node to the user

22 Thread Model We assume that three parties follow asemihonest model in which any two parties do not colludeto share information However even if the cloud does not

know the content of the encrypted query it may count thequery frequency of same encrypted request and the accessfrequency of the same returned result Thus depending onthe available information to the cloud we consider two threatmodels here

Known Ciphertext Model The cloud only knowsencrypted database secure index tree and the encryptedquery

Known Background Model In this stronger model thecloud knows background knowledge about the frequencydistribution of the query images For example Trumprsquosphotos are very popular in the presidential campaign Thecloud can conduct a statistical attack to identify certain imagevia the access frequency of the returned images

23 Design Goals To enable secure image retrieval overencrypted cloud data under the above models our schemeshould achieve the following design goals

Effectiveness The scheme is designed to support searchover encrypted data The result is as accurate as the search inthe plaintext domain

Efficiency Under vector space model we structure thefeature vectors of image database as an SBBT Instead oflinear or sublinear search efficiency our scheme will achievelogarithmic search time

Privacy The goal is to protect the privacy of ownerand user which is summarized as follows (1) The databaseand its index tree confidentiality namely the image and itsindex tree exist in the ciphertext form in the cloud (2) Thequery privacy it is not possible to determine whether tworequests are from the same query image Note that to ease theoverload of computation and communication our goals arenot to fully protect access pattern ie the search path in theSBBT

4 Security and Communication Networks

24 Notations Assume that the total number of images in thedatabase is119873 We define common notations as follows

(i) D the database namely the set of images

(ii) C the ciphertext form of D stored in the cloud

(iii) p1119894 the normalized feature such as HSV histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198891-dimensionalrow vector

(iv) p2119894 the normalized feature such as DCT histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198892-dimensionalrow vector

(v) 997888rarrp 119894 the integrated feature 997888rarrp 119894 = [120572p1119894 (1 minus 120572)p2119894 ] aweighted factor 120572 isin [0 1]

(vi) k the index vector in each node of BBT it is generatedfrom 997888rarrp119894

(vii) I the119867minusheight BBT for the whole image database D119867 asymp lceillog1198732 rceil(viii) V the ciphertext index which corresponds to plain-

text k

(ix) T the ciphertext form of I

(x) q1 the normalized feature of the query image such asHSV histogram

(xi) q2 the normalized feature of the query image such asDCT histogram

(xii) 997888rarrq the query vector 997888rarrq = [120573q1 (1 minus 120573)q2] a weightedfactor 120573 isin [0 1]

(xiii) M the encryption key M isin R(119889+1)times(119889+1) is a randominvertible matrix 119889 = 1198891 + 1198892

(xiv) Q the ciphertext query corresponding to plaintext 997888rarrqTo find the k-nearest neighbor of the query image we

adopt ldquoinner product similarityrdquo to evaluate the distanceamong the features Specifically 997888rarrp 119894 is an integrated featureof the database image including HSV histogram and DCThistogram 997888rarrq is the integrated feature of the query imageThefinal similarity score is obtained by997888rarrp 119894 ∙997888rarrq Based on the scorethe most relevant images in the database are returned to theuser

3 Proposed Scheme

In order to achieve logarithmic search efficiency we firstbuild a balanced index tree for the plaintext image databasein which a novel balanced binary clustering algorithm isproposed Then to ensure the confidentiality of index vectork and query vector 997888rarrq the secure inner product computationis employed to encrypt the vector k and 997888rarrq Also we adopt apartial image encryption to encrypt database images Finallywe improve the scheme to enhance privacy protection in theknown background model

31 BBC Algorithm Balanced Binary Clustering To buildthe balanced index tree we need a balanced binary clus-tering (BBC) algorithm to find a separating vector k whichdivides each dataset into two equal (or nearly equal) clustersHowever traditional k-means aims at finding cluster centersso that the sum of the distance between each point andits nearest center is maximized rather than the equal sizeof clusters Thus frequency sensitive competitive learning(FSCL) [27] is introduced to improve k-means FSCL is aconscience type algorithm that takes the size of clusters asa weight factor which makes the bigger cluster less likely towin more points in next iteration It can converge to a localminimum [28] Although the difference of cluster centerscan be used as a separating vector k FSCL does not ensurethat the k clusters are equally sized In fact it is experiencedas very unstable in high-dimensional vector spaces [29] Onthe contrast balanced k-means [30] can split a dataset intotwo equal-sized clusters by finding a perfect matching ina bipartite graph But it does not generate the separatingvector k Thus we propose a novel BBC to find a separatingvector that makes the size of two clusters equal To our bestknowledge this work is the first endeavor to combine FSCLand balanced k-means to achieve balanced clustering Thedetail of BBC algorithm is elaborated in the following

Given 119899 points 997888rarrp 119894119899119894=1 isin R119889 at the 119905th iteration the sizeof cluster is 120591(119905)

119896ge 0 and the cluster center is C(119905)

119896 119896 = 1 2

The BBC algorithm works as followsInitializationWeadopt k-means++ to choose twopoints

from 997888rarrp 119894119899119894=1 as initial centers C(1)119896

119896 = 1 2First uniformly choose an initial center C(1)1 from 997888rarrp 119894119899119894=1Second choose the next initial center C(1)2 from 997888rarrp 119894 997888rarrp 119894 =

C(1)1 119899119894=1 with probability D⟨997888rarrp 119894 C(1)1 ⟩ sum119863⟨997888rarrp 119894 C(1)1 ⟩119863⟨997888rarrp 119894 C(1)1 ⟩ = 1 minus (997888rarrp 119894 ∙ C(1)1 )997888rarrp 119894C(1)1 Finally according to the distance to the initial centers

other points are divided into two categories 119862119871119896 the size ofwhich is 120591(1)

119896 119896 = 1 2

Cluster Assignment Let new 119879(119905)119894119896

be the solution ofoptimization problem as follows

min 119869 (119879(119905)119894119896 ) = 2sum119896=1

(120591(119905)119896 sdot 119899sum119894=1

119879(119905)119894119896119863⟨997888rarrp 119894 C(119905)119896 ⟩) (1)

st

119879(119905)119894119896 isin 0 1 119894 = 1 119899 119896 = 1 22sum119896=1

119879(119905)119894119896 = 1 119894 = 1 119899119899sum119894=1

119879(119905)119894119896 ge lfloor1198992rfloor 119896 = 1 2(2)

where119863⟨997888rarrp119894 C(119905)119896 ⟩ = 1minus(997888rarrp 119894 ∙C(119905)119896 )997888rarrp 119894C(119905)119896 A smaller valueof119863⟨997888rarrp 119894 C(119905)

119896⟩ indicates that the two vectors 997888rarrp 119894 C(119905)

119896are more

similar

Security and Communication Networks 5

Inspired by balanced k-means we first turn (1) as a linearassignment problem (LAP) [31] The cost matrix is defined as

119862119864 =[[[[[[[[[[

119863⟨997888rarrp 1 C(1)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)119896

⟩119863⟨997888rarrp 1 C(2)119896 ⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896 ⟩

sdot sdot sdot119863 ⟨997888rarrp 1 C(1)

119896⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)

119896⟩

119863⟨997888rarrp 1 C(2)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896

]]]]]]]]]]

(3)

where119862119864 isin R2lceil1198992rceiltimes119899 If it is not a square matrix a column ofzeros is added to make it square Then we adopt Volgenant-Jonker (VJ) algorithm [32 33] to find a perfect matching in abipartite graph Finally if the vector 997888rarrp 119894 is assigned to centerC(119905)119896 119879(119905)119894119896

= 1 and 0 otherwiseCluster Update Update the cluster centers C(119905+1)

119896 the

ldquoselectionrdquo variables 119879(119905+1)119894119896 and the size of clusters 120591(119905+1)119896 asfollows

C(119905+1)119896 = sum119899119894=1 119879(119905)119894119896997888rarrp 119894sum119899119894=1 119879(119905)119894119896 119896 = 1 2119879(119905+1)1198941 = 1119879(119905+1)1198942 = 0

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ le 0119879(119905+1)1198941 = 0119879(119905+1)1198942 = 1

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ gt 0120591(119905+1)119896 = 119899sum

119894=1

119879(119905+1)119894119896 119896 = 1 2

(4)

Cluster assignment and cluster update are repeated until120591(119905+1)119896

le 2lceillog1198992rceilminus1 or C(119905+1)119896

= C(119905)119896 119896 = 1 2

Output The above three steps run several times withdifferent initial centers C(1)1 C(1)2

Owing to the fact that different initial centers maygenerate different results the result that the sizes of twoclusters are most similar is chosen as a final output The sizesimilarity of two clusters is defined as

119904 (120591(119905+1)119896 ) =

0 prod119896=12

120591(119905+1)119896

= 0min 120591(119905+1)

119896 119896 = 1 2

max 120591(119905+1)119896

119896 = 1 2 prod119896=12

120591(119905+1)119896

= 0 (5)

Finally the output is997888rarrp 119894 isin 119862119871119896 if 119879(119905+1)119894119896 = 1 119894 = 1 119899 119896 = 1 2k = [ C(119905)110038171003817100381710038171003817C(119905)1

10038171003817100381710038171003817 minus C(119905)210038171003817100381710038171003817C(119905)210038171003817100381710038171003817] (6)

The dataset of 119899 points can be divided into two equal (nearlyequal) clusters by the separating vector k

32 UBBT Unencrypted Balanced Binary Tree In thissubsection we build the BBT for the whole database Firstlywe generate an integrated feature 997888rarrp 119894 for each image in thedatabase Then an index vector k of the root node or internalnode of the BBT is generated by a top-down approach BBCalgorithm generates k to separate image features into twoclusters of nearly equal size ie 1198621198711 1198621198712 Namely

997888rarrp 119894 ∙ k ge 0 if 997888rarrp 119894 isin 1198621198711997888rarrp 119894 ∙ k lt 0 if 997888rarrp 119894 isin 1198621198712 (7)

Algorithm 1 recursively divides feature vectors of the databaseimages into two clusters until no cluster has more thantwo feature vectors The hierarchical index tree is shown inFigure 2 The level of the root node is 0 and the level of theleaf node is119867 Thus the height of entire index tree is119867 Thesubtree that the root node is at ℎth level of the index tree hasthe height of119867minusℎ The root and internal node 119906 in our BBTare defined as follows

119906 = ⟨119868119863 k 119906119871 119906119877⟩ (8)

where 119868119863 is the identity of node k is the index vector dividingimages on the leaf nodes into two clusters 1198621198711 and 1198621198712 and119906119871 and 119906119877 are the pointers to the left and right child node If119906 is a leaf node it is defined as

119906 = ⟨119868119863 119894119871 119894119871⟩ (9)

where 119894119871 points to the imagesThe search process starts from the root node of UBBT

If 997888rarrq ∙ k ge 0 we execute iterative retrieval upon the leftsubtree otherwise upon the right subtree until obtaining theleaf node

33 SBBT Secure Balanced Binary Tree InUBBT scheme theleaf node is retrieved by the sign of the inner product 997888rarrq ∙k In order to protect the sensitive data 997888rarrq and k we use thesecure inner product (ie ASPE) [8] to encrypt the vectorsand obtain the same result as the plaintext

First of all the owner generates a random invertiblematrix M isin R119889times119889 to encrypt k The user also use it to encrypt997888rarrq ie

V = kMminus1

Q = M997888rarrq T (10)

where V is the ciphertext index and Q is the ciphertext queryIn addition the JPEG images are encrypted via a partialimage encryption [22 25]Therefore index tree I is encryptedas T The leaf node of T is associated with the encryptedimages Then they are outsourced to the cloud for secureimage retrieval When retrieving the user submits Q of thequery image to the cloud Since VQ = 997888rarrq ∙ k the cloud can

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

2 Security and Communication Networks

access pattern) may leak the privacy of query user ObliviousRAMs [9] is a solution to protect the access pattern butnot practical enough The third issue is that a retrievalscheme of linear efficiency is not desirable because the searchtime will increase as the dataset becomes larger In factsecure information retrieval is commonly used for images ordocuments stored in a cloud serverWedetail themas follows

On the one hand abundant works have been put forwardto achieve secure retrieval in the encrypted documents Forexample Boolean search based on the single keyword isseparately proposed in symmetric key setting [11] and publickey setting [12] Since similarity search is more practical thanBoolean search multi-keyword ranked search [13] is studiedto enrich search functionality and improve result accuracywhere each document is associated with an index vectorEach element of the vector indicates whether a keyword existsor represents its ldquoterm frequency (TF) times inverse documentfrequency (IDF)rdquo Then the k-nearest neighbor is found bycomparing the cosine similarity between the query vectorand all index vectors which is linear efficiency In orderto improve retrieval efficiency a few works based on indextree are proposed For example Sun et al [14] present atree-based search scheme that constructs index vectors ofall documents as an MDB-tree It achieves sublinear searchefficiency via setting the prediction threshold for each levelof the index tree Although a tighter prediction value canobtain logarithmic search efficiency the result accuracy issacrificed at the same time Moreover Xia et al [15] builda KBB-tree from a bottom-up approach In KBB-tree theelement of internal node vector is the maximum value of thecorresponding position of its child node vectors A ldquoGreedy-Depth-First Searchrdquo algorithm is executed to find k mostrelevant leaf nodes which are stored in an RList If thecorrelation score between the query vector and the internalnode vector is smaller than the minimum score in the RListthe subtree of the internal node does not need to be searchedThus this scheme can also achieve sublinear efficiency

On the other hand some works have been proposed forencrypted image retrieval In [16] a privacy-enhanced facerecognition is realized with a help of Paillier homomorphicencryption (HE) The drawback is that the adoption of HEincurs heavy cost of computation and communication Tobe practical Lu et al propose a secure content-based imageretrieval (CBIR) scheme based on featureindex randomiza-tion [17] or min-Hash [5] Meanwhile the performance com-parison between homomorphic encryption and distance-preserving randomization is studied in [18] Due to the one-way and binary property of hash code secure CBIR thatexploits the hash function to encrypt features is effective andefficient in the large-scale database [6] However the accesspattern is leaked in the above schemes To solve this issueWeng et al [7] omit certain bits of the hash code of queryimage As a result the cloud returns all possible candidates tothe user Therefore the query pattern and access pattern areprotected But the user is involved to compare the features ofcandidates and obtain an accurate result Also it is difficult togenerate hash codes that uniformly distributed in the featurespace In further under the vector space model there is onlya handful of works that support efficient index structure

For example Xia et al [19] use local-sensitive hash (LSH) toconstruct a prefilter table but a refinement of the candidateresults is also a linear comparison Thus it just achievessublinear search efficiency Yuan et al [20] employ k-means tobuild an index tree Since k-means is not a balanced clusteringalgorithm it is inevitable to generate an index tree of skewedhierarchies Subsequently due to uneven depth in differentparts of index tree the search efficiency tends to be sublinearIn short under the vector space model the requirementfor secure and efficient image retrieval mechanisms remainsopen up to date

Besides the above schemes that based on the encryptedfeature the secure retrieval scheme directly based onencrypted images has been proposed It is designed for thewidely used JPEG image For example Zhang and Chengfirst use DC histogram preserved in the encrypted image as afeature [21] and then exploit AC histogram [22] and a newblock feature [23] for retrieval However the efficiency oflinear comparison or block-wise comparison is unacceptableTherefore the construction of secure index tree is necessary

In this paper we propose an efficient image retrievalschemeover encrypted cloud data which based on a balancedindex tree ie balanced binary tree (BBT) We introducea novel balanced binary clustering (BBC) algorithm anduse integrated features to build the BBT Owing to the factthat this paper realizes similarity search we tend to utilizeglobal feature (such as color shape) instead of accuratelocal feature (such as SIFT BRIEF) Hence our integratedfeature is composed of widely used HSV histogram andDCT histogram [5 21 22 24] We focus on the constructionof index tree and protection of query pattern and accesspattern As for the image encryption we adopt partial imageencryption technique [23 25 26] for JPEG format imageand mature cryptographic ciphers (such as AES) for theother format images Note that the key distribution is anindependent issue and out of the scope of this paper Ourmain contributions lie in three aspects

(1) We combine HSV histogram in the color space andDCT histogram in the transform domain as an integratedfeature To improve the search efficiency we construct a BBTfor the image database The BBT is built from a top-downapproach Since the hierarchical index tree is approximatelybalanced the scheme can achieve logarithmic search time

(2) By incorporating secure inner product (ie ASPE)the BBT is encrypted as secure BBT (SBBT) by a secretkey and query feature is encrypted The database image isencrypted by another secret key After that the sensitivedata in the cloud is protected meanwhile the searchablefunctionality of the encrypted feature is still valid Furtherthe random number extended in the query feature can blindthe query pattern

(3) To blind the search result we copy the database andmerge the subtree to rebuild SBBT The reduced SBBT andthe weighted factor of the integrated feature make the searchresult of one query image not unique Therefore the queryprivacy about access pattern is protected

The rest of this paper is organized as follows In Section 2we introduce system model threat model and design goalsAlso commonnotations are defined In Section 3 we propose

Security and Communication Networks 3

data owner authorized user

authorized user

encryptedqueryencrypted

index tree

encryptedimages

return

(access control key distribution)query encryption key amp image decryption keys

semi-trustedcloud server

Figure 1 A secure and efficient image retrieval scheme over encrypted data

BBC algorithm use BBC to build a BBT and adopt ASPE toencrypted BBT Section 4 presents and discusses experimen-tal results Finally we conclude the paper in Section 5

2 Problem Formulation

21 SystemModel The system contains three parties the dataowner the cloud server and authorized users as illustrated inFigure 1

The owner first extracts feature vectors such as HSVhistogram and DCT histogram from each image in hisdatabase They are concatenated into an integrated featureAfter the feature extraction the plaintext image is encryptedas the ciphertext by a secret key For improving searchefficiency the owner builds an index tree based on theintegrated features Then he utilizes another secret key toencrypt the index tree The leaf node of the secure index treeis associated with the encrypted image Afterward the owneroutsources the encrypted database images and the secureindex tree to the cloud In addition the owner sends secretkeys to each authorized user through a secure channel

The authorized user extracts feature vectors from thequery image To fetch encrypted images in the secure indextree he encrypts the integrated feature of a query image andsubmits it to the cloud

The cloud stores the encrypted database images andsecure index tree from the owner After receiving anencrypted query feature the cloud executes the search opera-tion in the secure index tree After that he returns encryptedimages that associated with retrieved leaf node to the user

22 Thread Model We assume that three parties follow asemihonest model in which any two parties do not colludeto share information However even if the cloud does not

know the content of the encrypted query it may count thequery frequency of same encrypted request and the accessfrequency of the same returned result Thus depending onthe available information to the cloud we consider two threatmodels here

Known Ciphertext Model The cloud only knowsencrypted database secure index tree and the encryptedquery

Known Background Model In this stronger model thecloud knows background knowledge about the frequencydistribution of the query images For example Trumprsquosphotos are very popular in the presidential campaign Thecloud can conduct a statistical attack to identify certain imagevia the access frequency of the returned images

23 Design Goals To enable secure image retrieval overencrypted cloud data under the above models our schemeshould achieve the following design goals

Effectiveness The scheme is designed to support searchover encrypted data The result is as accurate as the search inthe plaintext domain

Efficiency Under vector space model we structure thefeature vectors of image database as an SBBT Instead oflinear or sublinear search efficiency our scheme will achievelogarithmic search time

Privacy The goal is to protect the privacy of ownerand user which is summarized as follows (1) The databaseand its index tree confidentiality namely the image and itsindex tree exist in the ciphertext form in the cloud (2) Thequery privacy it is not possible to determine whether tworequests are from the same query image Note that to ease theoverload of computation and communication our goals arenot to fully protect access pattern ie the search path in theSBBT

4 Security and Communication Networks

24 Notations Assume that the total number of images in thedatabase is119873 We define common notations as follows

(i) D the database namely the set of images

(ii) C the ciphertext form of D stored in the cloud

(iii) p1119894 the normalized feature such as HSV histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198891-dimensionalrow vector

(iv) p2119894 the normalized feature such as DCT histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198892-dimensionalrow vector

(v) 997888rarrp 119894 the integrated feature 997888rarrp 119894 = [120572p1119894 (1 minus 120572)p2119894 ] aweighted factor 120572 isin [0 1]

(vi) k the index vector in each node of BBT it is generatedfrom 997888rarrp119894

(vii) I the119867minusheight BBT for the whole image database D119867 asymp lceillog1198732 rceil(viii) V the ciphertext index which corresponds to plain-

text k

(ix) T the ciphertext form of I

(x) q1 the normalized feature of the query image such asHSV histogram

(xi) q2 the normalized feature of the query image such asDCT histogram

(xii) 997888rarrq the query vector 997888rarrq = [120573q1 (1 minus 120573)q2] a weightedfactor 120573 isin [0 1]

(xiii) M the encryption key M isin R(119889+1)times(119889+1) is a randominvertible matrix 119889 = 1198891 + 1198892

(xiv) Q the ciphertext query corresponding to plaintext 997888rarrqTo find the k-nearest neighbor of the query image we

adopt ldquoinner product similarityrdquo to evaluate the distanceamong the features Specifically 997888rarrp 119894 is an integrated featureof the database image including HSV histogram and DCThistogram 997888rarrq is the integrated feature of the query imageThefinal similarity score is obtained by997888rarrp 119894 ∙997888rarrq Based on the scorethe most relevant images in the database are returned to theuser

3 Proposed Scheme

In order to achieve logarithmic search efficiency we firstbuild a balanced index tree for the plaintext image databasein which a novel balanced binary clustering algorithm isproposed Then to ensure the confidentiality of index vectork and query vector 997888rarrq the secure inner product computationis employed to encrypt the vector k and 997888rarrq Also we adopt apartial image encryption to encrypt database images Finallywe improve the scheme to enhance privacy protection in theknown background model

31 BBC Algorithm Balanced Binary Clustering To buildthe balanced index tree we need a balanced binary clus-tering (BBC) algorithm to find a separating vector k whichdivides each dataset into two equal (or nearly equal) clustersHowever traditional k-means aims at finding cluster centersso that the sum of the distance between each point andits nearest center is maximized rather than the equal sizeof clusters Thus frequency sensitive competitive learning(FSCL) [27] is introduced to improve k-means FSCL is aconscience type algorithm that takes the size of clusters asa weight factor which makes the bigger cluster less likely towin more points in next iteration It can converge to a localminimum [28] Although the difference of cluster centerscan be used as a separating vector k FSCL does not ensurethat the k clusters are equally sized In fact it is experiencedas very unstable in high-dimensional vector spaces [29] Onthe contrast balanced k-means [30] can split a dataset intotwo equal-sized clusters by finding a perfect matching ina bipartite graph But it does not generate the separatingvector k Thus we propose a novel BBC to find a separatingvector that makes the size of two clusters equal To our bestknowledge this work is the first endeavor to combine FSCLand balanced k-means to achieve balanced clustering Thedetail of BBC algorithm is elaborated in the following

Given 119899 points 997888rarrp 119894119899119894=1 isin R119889 at the 119905th iteration the sizeof cluster is 120591(119905)

119896ge 0 and the cluster center is C(119905)

119896 119896 = 1 2

The BBC algorithm works as followsInitializationWeadopt k-means++ to choose twopoints

from 997888rarrp 119894119899119894=1 as initial centers C(1)119896

119896 = 1 2First uniformly choose an initial center C(1)1 from 997888rarrp 119894119899119894=1Second choose the next initial center C(1)2 from 997888rarrp 119894 997888rarrp 119894 =

C(1)1 119899119894=1 with probability D⟨997888rarrp 119894 C(1)1 ⟩ sum119863⟨997888rarrp 119894 C(1)1 ⟩119863⟨997888rarrp 119894 C(1)1 ⟩ = 1 minus (997888rarrp 119894 ∙ C(1)1 )997888rarrp 119894C(1)1 Finally according to the distance to the initial centers

other points are divided into two categories 119862119871119896 the size ofwhich is 120591(1)

119896 119896 = 1 2

Cluster Assignment Let new 119879(119905)119894119896

be the solution ofoptimization problem as follows

min 119869 (119879(119905)119894119896 ) = 2sum119896=1

(120591(119905)119896 sdot 119899sum119894=1

119879(119905)119894119896119863⟨997888rarrp 119894 C(119905)119896 ⟩) (1)

st

119879(119905)119894119896 isin 0 1 119894 = 1 119899 119896 = 1 22sum119896=1

119879(119905)119894119896 = 1 119894 = 1 119899119899sum119894=1

119879(119905)119894119896 ge lfloor1198992rfloor 119896 = 1 2(2)

where119863⟨997888rarrp119894 C(119905)119896 ⟩ = 1minus(997888rarrp 119894 ∙C(119905)119896 )997888rarrp 119894C(119905)119896 A smaller valueof119863⟨997888rarrp 119894 C(119905)

119896⟩ indicates that the two vectors 997888rarrp 119894 C(119905)

119896are more

similar

Security and Communication Networks 5

Inspired by balanced k-means we first turn (1) as a linearassignment problem (LAP) [31] The cost matrix is defined as

119862119864 =[[[[[[[[[[

119863⟨997888rarrp 1 C(1)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)119896

⟩119863⟨997888rarrp 1 C(2)119896 ⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896 ⟩

sdot sdot sdot119863 ⟨997888rarrp 1 C(1)

119896⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)

119896⟩

119863⟨997888rarrp 1 C(2)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896

]]]]]]]]]]

(3)

where119862119864 isin R2lceil1198992rceiltimes119899 If it is not a square matrix a column ofzeros is added to make it square Then we adopt Volgenant-Jonker (VJ) algorithm [32 33] to find a perfect matching in abipartite graph Finally if the vector 997888rarrp 119894 is assigned to centerC(119905)119896 119879(119905)119894119896

= 1 and 0 otherwiseCluster Update Update the cluster centers C(119905+1)

119896 the

ldquoselectionrdquo variables 119879(119905+1)119894119896 and the size of clusters 120591(119905+1)119896 asfollows

C(119905+1)119896 = sum119899119894=1 119879(119905)119894119896997888rarrp 119894sum119899119894=1 119879(119905)119894119896 119896 = 1 2119879(119905+1)1198941 = 1119879(119905+1)1198942 = 0

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ le 0119879(119905+1)1198941 = 0119879(119905+1)1198942 = 1

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ gt 0120591(119905+1)119896 = 119899sum

119894=1

119879(119905+1)119894119896 119896 = 1 2

(4)

Cluster assignment and cluster update are repeated until120591(119905+1)119896

le 2lceillog1198992rceilminus1 or C(119905+1)119896

= C(119905)119896 119896 = 1 2

Output The above three steps run several times withdifferent initial centers C(1)1 C(1)2

Owing to the fact that different initial centers maygenerate different results the result that the sizes of twoclusters are most similar is chosen as a final output The sizesimilarity of two clusters is defined as

119904 (120591(119905+1)119896 ) =

0 prod119896=12

120591(119905+1)119896

= 0min 120591(119905+1)

119896 119896 = 1 2

max 120591(119905+1)119896

119896 = 1 2 prod119896=12

120591(119905+1)119896

= 0 (5)

Finally the output is997888rarrp 119894 isin 119862119871119896 if 119879(119905+1)119894119896 = 1 119894 = 1 119899 119896 = 1 2k = [ C(119905)110038171003817100381710038171003817C(119905)1

10038171003817100381710038171003817 minus C(119905)210038171003817100381710038171003817C(119905)210038171003817100381710038171003817] (6)

The dataset of 119899 points can be divided into two equal (nearlyequal) clusters by the separating vector k

32 UBBT Unencrypted Balanced Binary Tree In thissubsection we build the BBT for the whole database Firstlywe generate an integrated feature 997888rarrp 119894 for each image in thedatabase Then an index vector k of the root node or internalnode of the BBT is generated by a top-down approach BBCalgorithm generates k to separate image features into twoclusters of nearly equal size ie 1198621198711 1198621198712 Namely

997888rarrp 119894 ∙ k ge 0 if 997888rarrp 119894 isin 1198621198711997888rarrp 119894 ∙ k lt 0 if 997888rarrp 119894 isin 1198621198712 (7)

Algorithm 1 recursively divides feature vectors of the databaseimages into two clusters until no cluster has more thantwo feature vectors The hierarchical index tree is shown inFigure 2 The level of the root node is 0 and the level of theleaf node is119867 Thus the height of entire index tree is119867 Thesubtree that the root node is at ℎth level of the index tree hasthe height of119867minusℎ The root and internal node 119906 in our BBTare defined as follows

119906 = ⟨119868119863 k 119906119871 119906119877⟩ (8)

where 119868119863 is the identity of node k is the index vector dividingimages on the leaf nodes into two clusters 1198621198711 and 1198621198712 and119906119871 and 119906119877 are the pointers to the left and right child node If119906 is a leaf node it is defined as

119906 = ⟨119868119863 119894119871 119894119871⟩ (9)

where 119894119871 points to the imagesThe search process starts from the root node of UBBT

If 997888rarrq ∙ k ge 0 we execute iterative retrieval upon the leftsubtree otherwise upon the right subtree until obtaining theleaf node

33 SBBT Secure Balanced Binary Tree InUBBT scheme theleaf node is retrieved by the sign of the inner product 997888rarrq ∙k In order to protect the sensitive data 997888rarrq and k we use thesecure inner product (ie ASPE) [8] to encrypt the vectorsand obtain the same result as the plaintext

First of all the owner generates a random invertiblematrix M isin R119889times119889 to encrypt k The user also use it to encrypt997888rarrq ie

V = kMminus1

Q = M997888rarrq T (10)

where V is the ciphertext index and Q is the ciphertext queryIn addition the JPEG images are encrypted via a partialimage encryption [22 25]Therefore index tree I is encryptedas T The leaf node of T is associated with the encryptedimages Then they are outsourced to the cloud for secureimage retrieval When retrieving the user submits Q of thequery image to the cloud Since VQ = 997888rarrq ∙ k the cloud can

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 3

data owner authorized user

authorized user

encryptedqueryencrypted

index tree

encryptedimages

return

(access control key distribution)query encryption key amp image decryption keys

semi-trustedcloud server

Figure 1 A secure and efficient image retrieval scheme over encrypted data

BBC algorithm use BBC to build a BBT and adopt ASPE toencrypted BBT Section 4 presents and discusses experimen-tal results Finally we conclude the paper in Section 5

2 Problem Formulation

21 SystemModel The system contains three parties the dataowner the cloud server and authorized users as illustrated inFigure 1

The owner first extracts feature vectors such as HSVhistogram and DCT histogram from each image in hisdatabase They are concatenated into an integrated featureAfter the feature extraction the plaintext image is encryptedas the ciphertext by a secret key For improving searchefficiency the owner builds an index tree based on theintegrated features Then he utilizes another secret key toencrypt the index tree The leaf node of the secure index treeis associated with the encrypted image Afterward the owneroutsources the encrypted database images and the secureindex tree to the cloud In addition the owner sends secretkeys to each authorized user through a secure channel

The authorized user extracts feature vectors from thequery image To fetch encrypted images in the secure indextree he encrypts the integrated feature of a query image andsubmits it to the cloud

The cloud stores the encrypted database images andsecure index tree from the owner After receiving anencrypted query feature the cloud executes the search opera-tion in the secure index tree After that he returns encryptedimages that associated with retrieved leaf node to the user

22 Thread Model We assume that three parties follow asemihonest model in which any two parties do not colludeto share information However even if the cloud does not

know the content of the encrypted query it may count thequery frequency of same encrypted request and the accessfrequency of the same returned result Thus depending onthe available information to the cloud we consider two threatmodels here

Known Ciphertext Model The cloud only knowsencrypted database secure index tree and the encryptedquery

Known Background Model In this stronger model thecloud knows background knowledge about the frequencydistribution of the query images For example Trumprsquosphotos are very popular in the presidential campaign Thecloud can conduct a statistical attack to identify certain imagevia the access frequency of the returned images

23 Design Goals To enable secure image retrieval overencrypted cloud data under the above models our schemeshould achieve the following design goals

Effectiveness The scheme is designed to support searchover encrypted data The result is as accurate as the search inthe plaintext domain

Efficiency Under vector space model we structure thefeature vectors of image database as an SBBT Instead oflinear or sublinear search efficiency our scheme will achievelogarithmic search time

Privacy The goal is to protect the privacy of ownerand user which is summarized as follows (1) The databaseand its index tree confidentiality namely the image and itsindex tree exist in the ciphertext form in the cloud (2) Thequery privacy it is not possible to determine whether tworequests are from the same query image Note that to ease theoverload of computation and communication our goals arenot to fully protect access pattern ie the search path in theSBBT

4 Security and Communication Networks

24 Notations Assume that the total number of images in thedatabase is119873 We define common notations as follows

(i) D the database namely the set of images

(ii) C the ciphertext form of D stored in the cloud

(iii) p1119894 the normalized feature such as HSV histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198891-dimensionalrow vector

(iv) p2119894 the normalized feature such as DCT histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198892-dimensionalrow vector

(v) 997888rarrp 119894 the integrated feature 997888rarrp 119894 = [120572p1119894 (1 minus 120572)p2119894 ] aweighted factor 120572 isin [0 1]

(vi) k the index vector in each node of BBT it is generatedfrom 997888rarrp119894

(vii) I the119867minusheight BBT for the whole image database D119867 asymp lceillog1198732 rceil(viii) V the ciphertext index which corresponds to plain-

text k

(ix) T the ciphertext form of I

(x) q1 the normalized feature of the query image such asHSV histogram

(xi) q2 the normalized feature of the query image such asDCT histogram

(xii) 997888rarrq the query vector 997888rarrq = [120573q1 (1 minus 120573)q2] a weightedfactor 120573 isin [0 1]

(xiii) M the encryption key M isin R(119889+1)times(119889+1) is a randominvertible matrix 119889 = 1198891 + 1198892

(xiv) Q the ciphertext query corresponding to plaintext 997888rarrqTo find the k-nearest neighbor of the query image we

adopt ldquoinner product similarityrdquo to evaluate the distanceamong the features Specifically 997888rarrp 119894 is an integrated featureof the database image including HSV histogram and DCThistogram 997888rarrq is the integrated feature of the query imageThefinal similarity score is obtained by997888rarrp 119894 ∙997888rarrq Based on the scorethe most relevant images in the database are returned to theuser

3 Proposed Scheme

In order to achieve logarithmic search efficiency we firstbuild a balanced index tree for the plaintext image databasein which a novel balanced binary clustering algorithm isproposed Then to ensure the confidentiality of index vectork and query vector 997888rarrq the secure inner product computationis employed to encrypt the vector k and 997888rarrq Also we adopt apartial image encryption to encrypt database images Finallywe improve the scheme to enhance privacy protection in theknown background model

31 BBC Algorithm Balanced Binary Clustering To buildthe balanced index tree we need a balanced binary clus-tering (BBC) algorithm to find a separating vector k whichdivides each dataset into two equal (or nearly equal) clustersHowever traditional k-means aims at finding cluster centersso that the sum of the distance between each point andits nearest center is maximized rather than the equal sizeof clusters Thus frequency sensitive competitive learning(FSCL) [27] is introduced to improve k-means FSCL is aconscience type algorithm that takes the size of clusters asa weight factor which makes the bigger cluster less likely towin more points in next iteration It can converge to a localminimum [28] Although the difference of cluster centerscan be used as a separating vector k FSCL does not ensurethat the k clusters are equally sized In fact it is experiencedas very unstable in high-dimensional vector spaces [29] Onthe contrast balanced k-means [30] can split a dataset intotwo equal-sized clusters by finding a perfect matching ina bipartite graph But it does not generate the separatingvector k Thus we propose a novel BBC to find a separatingvector that makes the size of two clusters equal To our bestknowledge this work is the first endeavor to combine FSCLand balanced k-means to achieve balanced clustering Thedetail of BBC algorithm is elaborated in the following

Given 119899 points 997888rarrp 119894119899119894=1 isin R119889 at the 119905th iteration the sizeof cluster is 120591(119905)

119896ge 0 and the cluster center is C(119905)

119896 119896 = 1 2

The BBC algorithm works as followsInitializationWeadopt k-means++ to choose twopoints

from 997888rarrp 119894119899119894=1 as initial centers C(1)119896

119896 = 1 2First uniformly choose an initial center C(1)1 from 997888rarrp 119894119899119894=1Second choose the next initial center C(1)2 from 997888rarrp 119894 997888rarrp 119894 =

C(1)1 119899119894=1 with probability D⟨997888rarrp 119894 C(1)1 ⟩ sum119863⟨997888rarrp 119894 C(1)1 ⟩119863⟨997888rarrp 119894 C(1)1 ⟩ = 1 minus (997888rarrp 119894 ∙ C(1)1 )997888rarrp 119894C(1)1 Finally according to the distance to the initial centers

other points are divided into two categories 119862119871119896 the size ofwhich is 120591(1)

119896 119896 = 1 2

Cluster Assignment Let new 119879(119905)119894119896

be the solution ofoptimization problem as follows

min 119869 (119879(119905)119894119896 ) = 2sum119896=1

(120591(119905)119896 sdot 119899sum119894=1

119879(119905)119894119896119863⟨997888rarrp 119894 C(119905)119896 ⟩) (1)

st

119879(119905)119894119896 isin 0 1 119894 = 1 119899 119896 = 1 22sum119896=1

119879(119905)119894119896 = 1 119894 = 1 119899119899sum119894=1

119879(119905)119894119896 ge lfloor1198992rfloor 119896 = 1 2(2)

where119863⟨997888rarrp119894 C(119905)119896 ⟩ = 1minus(997888rarrp 119894 ∙C(119905)119896 )997888rarrp 119894C(119905)119896 A smaller valueof119863⟨997888rarrp 119894 C(119905)

119896⟩ indicates that the two vectors 997888rarrp 119894 C(119905)

119896are more

similar

Security and Communication Networks 5

Inspired by balanced k-means we first turn (1) as a linearassignment problem (LAP) [31] The cost matrix is defined as

119862119864 =[[[[[[[[[[

119863⟨997888rarrp 1 C(1)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)119896

⟩119863⟨997888rarrp 1 C(2)119896 ⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896 ⟩

sdot sdot sdot119863 ⟨997888rarrp 1 C(1)

119896⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)

119896⟩

119863⟨997888rarrp 1 C(2)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896

]]]]]]]]]]

(3)

where119862119864 isin R2lceil1198992rceiltimes119899 If it is not a square matrix a column ofzeros is added to make it square Then we adopt Volgenant-Jonker (VJ) algorithm [32 33] to find a perfect matching in abipartite graph Finally if the vector 997888rarrp 119894 is assigned to centerC(119905)119896 119879(119905)119894119896

= 1 and 0 otherwiseCluster Update Update the cluster centers C(119905+1)

119896 the

ldquoselectionrdquo variables 119879(119905+1)119894119896 and the size of clusters 120591(119905+1)119896 asfollows

C(119905+1)119896 = sum119899119894=1 119879(119905)119894119896997888rarrp 119894sum119899119894=1 119879(119905)119894119896 119896 = 1 2119879(119905+1)1198941 = 1119879(119905+1)1198942 = 0

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ le 0119879(119905+1)1198941 = 0119879(119905+1)1198942 = 1

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ gt 0120591(119905+1)119896 = 119899sum

119894=1

119879(119905+1)119894119896 119896 = 1 2

(4)

Cluster assignment and cluster update are repeated until120591(119905+1)119896

le 2lceillog1198992rceilminus1 or C(119905+1)119896

= C(119905)119896 119896 = 1 2

Output The above three steps run several times withdifferent initial centers C(1)1 C(1)2

Owing to the fact that different initial centers maygenerate different results the result that the sizes of twoclusters are most similar is chosen as a final output The sizesimilarity of two clusters is defined as

119904 (120591(119905+1)119896 ) =

0 prod119896=12

120591(119905+1)119896

= 0min 120591(119905+1)

119896 119896 = 1 2

max 120591(119905+1)119896

119896 = 1 2 prod119896=12

120591(119905+1)119896

= 0 (5)

Finally the output is997888rarrp 119894 isin 119862119871119896 if 119879(119905+1)119894119896 = 1 119894 = 1 119899 119896 = 1 2k = [ C(119905)110038171003817100381710038171003817C(119905)1

10038171003817100381710038171003817 minus C(119905)210038171003817100381710038171003817C(119905)210038171003817100381710038171003817] (6)

The dataset of 119899 points can be divided into two equal (nearlyequal) clusters by the separating vector k

32 UBBT Unencrypted Balanced Binary Tree In thissubsection we build the BBT for the whole database Firstlywe generate an integrated feature 997888rarrp 119894 for each image in thedatabase Then an index vector k of the root node or internalnode of the BBT is generated by a top-down approach BBCalgorithm generates k to separate image features into twoclusters of nearly equal size ie 1198621198711 1198621198712 Namely

997888rarrp 119894 ∙ k ge 0 if 997888rarrp 119894 isin 1198621198711997888rarrp 119894 ∙ k lt 0 if 997888rarrp 119894 isin 1198621198712 (7)

Algorithm 1 recursively divides feature vectors of the databaseimages into two clusters until no cluster has more thantwo feature vectors The hierarchical index tree is shown inFigure 2 The level of the root node is 0 and the level of theleaf node is119867 Thus the height of entire index tree is119867 Thesubtree that the root node is at ℎth level of the index tree hasthe height of119867minusℎ The root and internal node 119906 in our BBTare defined as follows

119906 = ⟨119868119863 k 119906119871 119906119877⟩ (8)

where 119868119863 is the identity of node k is the index vector dividingimages on the leaf nodes into two clusters 1198621198711 and 1198621198712 and119906119871 and 119906119877 are the pointers to the left and right child node If119906 is a leaf node it is defined as

119906 = ⟨119868119863 119894119871 119894119871⟩ (9)

where 119894119871 points to the imagesThe search process starts from the root node of UBBT

If 997888rarrq ∙ k ge 0 we execute iterative retrieval upon the leftsubtree otherwise upon the right subtree until obtaining theleaf node

33 SBBT Secure Balanced Binary Tree InUBBT scheme theleaf node is retrieved by the sign of the inner product 997888rarrq ∙k In order to protect the sensitive data 997888rarrq and k we use thesecure inner product (ie ASPE) [8] to encrypt the vectorsand obtain the same result as the plaintext

First of all the owner generates a random invertiblematrix M isin R119889times119889 to encrypt k The user also use it to encrypt997888rarrq ie

V = kMminus1

Q = M997888rarrq T (10)

where V is the ciphertext index and Q is the ciphertext queryIn addition the JPEG images are encrypted via a partialimage encryption [22 25]Therefore index tree I is encryptedas T The leaf node of T is associated with the encryptedimages Then they are outsourced to the cloud for secureimage retrieval When retrieving the user submits Q of thequery image to the cloud Since VQ = 997888rarrq ∙ k the cloud can

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

4 Security and Communication Networks

24 Notations Assume that the total number of images in thedatabase is119873 We define common notations as follows

(i) D the database namely the set of images

(ii) C the ciphertext form of D stored in the cloud

(iii) p1119894 the normalized feature such as HSV histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198891-dimensionalrow vector

(iv) p2119894 the normalized feature such as DCT histogramfrom the 119894th image 1 le 119894 le 119873 it is a 1198892-dimensionalrow vector

(v) 997888rarrp 119894 the integrated feature 997888rarrp 119894 = [120572p1119894 (1 minus 120572)p2119894 ] aweighted factor 120572 isin [0 1]

(vi) k the index vector in each node of BBT it is generatedfrom 997888rarrp119894

(vii) I the119867minusheight BBT for the whole image database D119867 asymp lceillog1198732 rceil(viii) V the ciphertext index which corresponds to plain-

text k

(ix) T the ciphertext form of I

(x) q1 the normalized feature of the query image such asHSV histogram

(xi) q2 the normalized feature of the query image such asDCT histogram

(xii) 997888rarrq the query vector 997888rarrq = [120573q1 (1 minus 120573)q2] a weightedfactor 120573 isin [0 1]

(xiii) M the encryption key M isin R(119889+1)times(119889+1) is a randominvertible matrix 119889 = 1198891 + 1198892

(xiv) Q the ciphertext query corresponding to plaintext 997888rarrqTo find the k-nearest neighbor of the query image we

adopt ldquoinner product similarityrdquo to evaluate the distanceamong the features Specifically 997888rarrp 119894 is an integrated featureof the database image including HSV histogram and DCThistogram 997888rarrq is the integrated feature of the query imageThefinal similarity score is obtained by997888rarrp 119894 ∙997888rarrq Based on the scorethe most relevant images in the database are returned to theuser

3 Proposed Scheme

In order to achieve logarithmic search efficiency we firstbuild a balanced index tree for the plaintext image databasein which a novel balanced binary clustering algorithm isproposed Then to ensure the confidentiality of index vectork and query vector 997888rarrq the secure inner product computationis employed to encrypt the vector k and 997888rarrq Also we adopt apartial image encryption to encrypt database images Finallywe improve the scheme to enhance privacy protection in theknown background model

31 BBC Algorithm Balanced Binary Clustering To buildthe balanced index tree we need a balanced binary clus-tering (BBC) algorithm to find a separating vector k whichdivides each dataset into two equal (or nearly equal) clustersHowever traditional k-means aims at finding cluster centersso that the sum of the distance between each point andits nearest center is maximized rather than the equal sizeof clusters Thus frequency sensitive competitive learning(FSCL) [27] is introduced to improve k-means FSCL is aconscience type algorithm that takes the size of clusters asa weight factor which makes the bigger cluster less likely towin more points in next iteration It can converge to a localminimum [28] Although the difference of cluster centerscan be used as a separating vector k FSCL does not ensurethat the k clusters are equally sized In fact it is experiencedas very unstable in high-dimensional vector spaces [29] Onthe contrast balanced k-means [30] can split a dataset intotwo equal-sized clusters by finding a perfect matching ina bipartite graph But it does not generate the separatingvector k Thus we propose a novel BBC to find a separatingvector that makes the size of two clusters equal To our bestknowledge this work is the first endeavor to combine FSCLand balanced k-means to achieve balanced clustering Thedetail of BBC algorithm is elaborated in the following

Given 119899 points 997888rarrp 119894119899119894=1 isin R119889 at the 119905th iteration the sizeof cluster is 120591(119905)

119896ge 0 and the cluster center is C(119905)

119896 119896 = 1 2

The BBC algorithm works as followsInitializationWeadopt k-means++ to choose twopoints

from 997888rarrp 119894119899119894=1 as initial centers C(1)119896

119896 = 1 2First uniformly choose an initial center C(1)1 from 997888rarrp 119894119899119894=1Second choose the next initial center C(1)2 from 997888rarrp 119894 997888rarrp 119894 =

C(1)1 119899119894=1 with probability D⟨997888rarrp 119894 C(1)1 ⟩ sum119863⟨997888rarrp 119894 C(1)1 ⟩119863⟨997888rarrp 119894 C(1)1 ⟩ = 1 minus (997888rarrp 119894 ∙ C(1)1 )997888rarrp 119894C(1)1 Finally according to the distance to the initial centers

other points are divided into two categories 119862119871119896 the size ofwhich is 120591(1)

119896 119896 = 1 2

Cluster Assignment Let new 119879(119905)119894119896

be the solution ofoptimization problem as follows

min 119869 (119879(119905)119894119896 ) = 2sum119896=1

(120591(119905)119896 sdot 119899sum119894=1

119879(119905)119894119896119863⟨997888rarrp 119894 C(119905)119896 ⟩) (1)

st

119879(119905)119894119896 isin 0 1 119894 = 1 119899 119896 = 1 22sum119896=1

119879(119905)119894119896 = 1 119894 = 1 119899119899sum119894=1

119879(119905)119894119896 ge lfloor1198992rfloor 119896 = 1 2(2)

where119863⟨997888rarrp119894 C(119905)119896 ⟩ = 1minus(997888rarrp 119894 ∙C(119905)119896 )997888rarrp 119894C(119905)119896 A smaller valueof119863⟨997888rarrp 119894 C(119905)

119896⟩ indicates that the two vectors 997888rarrp 119894 C(119905)

119896are more

similar

Security and Communication Networks 5

Inspired by balanced k-means we first turn (1) as a linearassignment problem (LAP) [31] The cost matrix is defined as

119862119864 =[[[[[[[[[[

119863⟨997888rarrp 1 C(1)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)119896

⟩119863⟨997888rarrp 1 C(2)119896 ⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896 ⟩

sdot sdot sdot119863 ⟨997888rarrp 1 C(1)

119896⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)

119896⟩

119863⟨997888rarrp 1 C(2)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896

]]]]]]]]]]

(3)

where119862119864 isin R2lceil1198992rceiltimes119899 If it is not a square matrix a column ofzeros is added to make it square Then we adopt Volgenant-Jonker (VJ) algorithm [32 33] to find a perfect matching in abipartite graph Finally if the vector 997888rarrp 119894 is assigned to centerC(119905)119896 119879(119905)119894119896

= 1 and 0 otherwiseCluster Update Update the cluster centers C(119905+1)

119896 the

ldquoselectionrdquo variables 119879(119905+1)119894119896 and the size of clusters 120591(119905+1)119896 asfollows

C(119905+1)119896 = sum119899119894=1 119879(119905)119894119896997888rarrp 119894sum119899119894=1 119879(119905)119894119896 119896 = 1 2119879(119905+1)1198941 = 1119879(119905+1)1198942 = 0

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ le 0119879(119905+1)1198941 = 0119879(119905+1)1198942 = 1

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ gt 0120591(119905+1)119896 = 119899sum

119894=1

119879(119905+1)119894119896 119896 = 1 2

(4)

Cluster assignment and cluster update are repeated until120591(119905+1)119896

le 2lceillog1198992rceilminus1 or C(119905+1)119896

= C(119905)119896 119896 = 1 2

Output The above three steps run several times withdifferent initial centers C(1)1 C(1)2

Owing to the fact that different initial centers maygenerate different results the result that the sizes of twoclusters are most similar is chosen as a final output The sizesimilarity of two clusters is defined as

119904 (120591(119905+1)119896 ) =

0 prod119896=12

120591(119905+1)119896

= 0min 120591(119905+1)

119896 119896 = 1 2

max 120591(119905+1)119896

119896 = 1 2 prod119896=12

120591(119905+1)119896

= 0 (5)

Finally the output is997888rarrp 119894 isin 119862119871119896 if 119879(119905+1)119894119896 = 1 119894 = 1 119899 119896 = 1 2k = [ C(119905)110038171003817100381710038171003817C(119905)1

10038171003817100381710038171003817 minus C(119905)210038171003817100381710038171003817C(119905)210038171003817100381710038171003817] (6)

The dataset of 119899 points can be divided into two equal (nearlyequal) clusters by the separating vector k

32 UBBT Unencrypted Balanced Binary Tree In thissubsection we build the BBT for the whole database Firstlywe generate an integrated feature 997888rarrp 119894 for each image in thedatabase Then an index vector k of the root node or internalnode of the BBT is generated by a top-down approach BBCalgorithm generates k to separate image features into twoclusters of nearly equal size ie 1198621198711 1198621198712 Namely

997888rarrp 119894 ∙ k ge 0 if 997888rarrp 119894 isin 1198621198711997888rarrp 119894 ∙ k lt 0 if 997888rarrp 119894 isin 1198621198712 (7)

Algorithm 1 recursively divides feature vectors of the databaseimages into two clusters until no cluster has more thantwo feature vectors The hierarchical index tree is shown inFigure 2 The level of the root node is 0 and the level of theleaf node is119867 Thus the height of entire index tree is119867 Thesubtree that the root node is at ℎth level of the index tree hasthe height of119867minusℎ The root and internal node 119906 in our BBTare defined as follows

119906 = ⟨119868119863 k 119906119871 119906119877⟩ (8)

where 119868119863 is the identity of node k is the index vector dividingimages on the leaf nodes into two clusters 1198621198711 and 1198621198712 and119906119871 and 119906119877 are the pointers to the left and right child node If119906 is a leaf node it is defined as

119906 = ⟨119868119863 119894119871 119894119871⟩ (9)

where 119894119871 points to the imagesThe search process starts from the root node of UBBT

If 997888rarrq ∙ k ge 0 we execute iterative retrieval upon the leftsubtree otherwise upon the right subtree until obtaining theleaf node

33 SBBT Secure Balanced Binary Tree InUBBT scheme theleaf node is retrieved by the sign of the inner product 997888rarrq ∙k In order to protect the sensitive data 997888rarrq and k we use thesecure inner product (ie ASPE) [8] to encrypt the vectorsand obtain the same result as the plaintext

First of all the owner generates a random invertiblematrix M isin R119889times119889 to encrypt k The user also use it to encrypt997888rarrq ie

V = kMminus1

Q = M997888rarrq T (10)

where V is the ciphertext index and Q is the ciphertext queryIn addition the JPEG images are encrypted via a partialimage encryption [22 25]Therefore index tree I is encryptedas T The leaf node of T is associated with the encryptedimages Then they are outsourced to the cloud for secureimage retrieval When retrieving the user submits Q of thequery image to the cloud Since VQ = 997888rarrq ∙ k the cloud can

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 5

Inspired by balanced k-means we first turn (1) as a linearassignment problem (LAP) [31] The cost matrix is defined as

119862119864 =[[[[[[[[[[

119863⟨997888rarrp 1 C(1)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)119896

⟩119863⟨997888rarrp 1 C(2)119896 ⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896 ⟩

sdot sdot sdot119863 ⟨997888rarrp 1 C(1)

119896⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(1)

119896⟩

119863⟨997888rarrp 1 C(2)119896

⟩ sdot sdot sdot 119863 ⟨997888rarrp 119899 C(2)119896

]]]]]]]]]]

(3)

where119862119864 isin R2lceil1198992rceiltimes119899 If it is not a square matrix a column ofzeros is added to make it square Then we adopt Volgenant-Jonker (VJ) algorithm [32 33] to find a perfect matching in abipartite graph Finally if the vector 997888rarrp 119894 is assigned to centerC(119905)119896 119879(119905)119894119896

= 1 and 0 otherwiseCluster Update Update the cluster centers C(119905+1)

119896 the

ldquoselectionrdquo variables 119879(119905+1)119894119896 and the size of clusters 120591(119905+1)119896 asfollows

C(119905+1)119896 = sum119899119894=1 119879(119905)119894119896997888rarrp 119894sum119899119894=1 119879(119905)119894119896 119896 = 1 2119879(119905+1)1198941 = 1119879(119905+1)1198942 = 0

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ le 0119879(119905+1)1198941 = 0119879(119905+1)1198942 = 1

if 119863⟨997888rarrp 119894 C(119905+1)1 minus C(119905+1)2 ⟩ gt 0120591(119905+1)119896 = 119899sum

119894=1

119879(119905+1)119894119896 119896 = 1 2

(4)

Cluster assignment and cluster update are repeated until120591(119905+1)119896

le 2lceillog1198992rceilminus1 or C(119905+1)119896

= C(119905)119896 119896 = 1 2

Output The above three steps run several times withdifferent initial centers C(1)1 C(1)2

Owing to the fact that different initial centers maygenerate different results the result that the sizes of twoclusters are most similar is chosen as a final output The sizesimilarity of two clusters is defined as

119904 (120591(119905+1)119896 ) =

0 prod119896=12

120591(119905+1)119896

= 0min 120591(119905+1)

119896 119896 = 1 2

max 120591(119905+1)119896

119896 = 1 2 prod119896=12

120591(119905+1)119896

= 0 (5)

Finally the output is997888rarrp 119894 isin 119862119871119896 if 119879(119905+1)119894119896 = 1 119894 = 1 119899 119896 = 1 2k = [ C(119905)110038171003817100381710038171003817C(119905)1

10038171003817100381710038171003817 minus C(119905)210038171003817100381710038171003817C(119905)210038171003817100381710038171003817] (6)

The dataset of 119899 points can be divided into two equal (nearlyequal) clusters by the separating vector k

32 UBBT Unencrypted Balanced Binary Tree In thissubsection we build the BBT for the whole database Firstlywe generate an integrated feature 997888rarrp 119894 for each image in thedatabase Then an index vector k of the root node or internalnode of the BBT is generated by a top-down approach BBCalgorithm generates k to separate image features into twoclusters of nearly equal size ie 1198621198711 1198621198712 Namely

997888rarrp 119894 ∙ k ge 0 if 997888rarrp 119894 isin 1198621198711997888rarrp 119894 ∙ k lt 0 if 997888rarrp 119894 isin 1198621198712 (7)

Algorithm 1 recursively divides feature vectors of the databaseimages into two clusters until no cluster has more thantwo feature vectors The hierarchical index tree is shown inFigure 2 The level of the root node is 0 and the level of theleaf node is119867 Thus the height of entire index tree is119867 Thesubtree that the root node is at ℎth level of the index tree hasthe height of119867minusℎ The root and internal node 119906 in our BBTare defined as follows

119906 = ⟨119868119863 k 119906119871 119906119877⟩ (8)

where 119868119863 is the identity of node k is the index vector dividingimages on the leaf nodes into two clusters 1198621198711 and 1198621198712 and119906119871 and 119906119877 are the pointers to the left and right child node If119906 is a leaf node it is defined as

119906 = ⟨119868119863 119894119871 119894119871⟩ (9)

where 119894119871 points to the imagesThe search process starts from the root node of UBBT

If 997888rarrq ∙ k ge 0 we execute iterative retrieval upon the leftsubtree otherwise upon the right subtree until obtaining theleaf node

33 SBBT Secure Balanced Binary Tree InUBBT scheme theleaf node is retrieved by the sign of the inner product 997888rarrq ∙k In order to protect the sensitive data 997888rarrq and k we use thesecure inner product (ie ASPE) [8] to encrypt the vectorsand obtain the same result as the plaintext

First of all the owner generates a random invertiblematrix M isin R119889times119889 to encrypt k The user also use it to encrypt997888rarrq ie

V = kMminus1

Q = M997888rarrq T (10)

where V is the ciphertext index and Q is the ciphertext queryIn addition the JPEG images are encrypted via a partialimage encryption [22 25]Therefore index tree I is encryptedas T The leaf node of T is associated with the encryptedimages Then they are outsourced to the cloud for secureimage retrieval When retrieving the user submits Q of thequery image to the cloud Since VQ = 997888rarrq ∙ k the cloud can

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

6 Security and Communication Networks

Input 997888rarrp 119894 1 le 119894 le 119873 Balanced Binary Clustering ieBBC

Output Balanced Binary Tree I(1) ℎ = 0 the height of node(2) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 1198941le119894le119873) 997888rarr root node(3) function 119862ℎ119894119897119889119903119890119899119866119890119899 (kectors 997888rarrp 119894)(4) 119861119861119862 (997888rarrp 119894) 997888rarr two clusters1198621198711 1198621198712

separating vector as the index k(5) if the size of 1198621198711 ge 2 then(6) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198711) 997888rarr internal node(7) else(8) 119894th image119894isin1198621198711 997888rarr leaf node(9) end if(10) if the size of 1198621198712 ge 2 then(11) 119862ℎ119894119897119889119903119890119899119866119890119899 (997888rarrp 119894119894isin1198621198712) 997888rarr internal node(12) else(13) 119894th image119894isin1198621198712 997888rarr leaf node(14) end if(15) ℎ = ℎ + 1(16) return k 1198621198711 1198621198712(17) end function

Algorithm 1 Building balanced binary tree

leaf node H

internal node h

root node 0

internal node 1

v

CL1 CL2

subtree

Figure 2 The hierarchical structure of unencrypted BBT

use Q to execute an effective search on secure index tree TTherefore the retrieved result and efficiency in the ciphertextwould always be in line with that in the plaintext Note thatthe more details of ASPE are described in [8]

Unfortunately the relationship between 997888rarrq and Q isdeterministic According to the same Q or VQ the cloud cancount the query frequency of 997888rarrq Further according to thesame result the cloud can also count the access frequencyof 997888rarrq Therefore if the query image and 997888rarrq is a one-to-onerelationship the proposed scheme cannot resist the statisticalattack in the known background model

To break the one-to-one relationship between the queryimage and ciphertext query Q we extend 997888rarrq and k as follows

997888rarrq = 120579 [120574 997888rarrq ]k = [0 k] (11)

where the random numbers 120579 gt 0 120574 ≫ 1 Thus thesame query image can generate different ciphertext queryQ by setting 120579 and 120574 Meanwhile VQ = 120579(997888rarrq ∙ k) isblinded by positive 120579The cloud cannot get available statistical

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 7

(a) (b) (c)

Figure 3 (a) a query image (b) one encrypted version and (c) another encrypted version

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4 In Corel database [10] except for (a)(e) a query image the most similar image by the retrieval based on (b)(f) HSV histogram(c)(g) DCT histogram or (d)(h) an integrated feature

information from the ciphertext query Q and the result VQHowever the sign of VQ remains the same even if 120574 and 120579 aredifferent so the retrieved result and search path remain thesameThe request frequency of a query image can be inferredfrom the access frequency of the same return image

To overcome the issue we first copy the image databaseThe owner uses partial image encryption [23] to protectplaintext image According to different encryption keys sameplaintext image can be encrypted as different encryptedimages as shown in Figure 3 Meanwhile the owner uses theembedding key to hide which encryption key was used [25]Then the owner shares encryption keys and the embeddingkey with authorized users through a secret channel Afterobtaining an encrypted image the user uses the embeddingkey to extract hidden data and select true encryption key torecover the plaintext image

Second different encrypted images derived from thesame plaintext image correspond to different integratedfeatures For the diversity of the integrated feature we useHSV histogram and DCT histogram as basic features Thereason is that HSV histogram in color space and DCThistogram in transform domain are image features fromdifferent perspectives As shown inFigure 4 different features

of the query image can still retrieve a similar image Forease of description we define the new ciphertext databaseas C1015840 Then we set C corresponding to the feature 997888rarrp 119894 =[1205721p1119894 (1 minus 1205721)p2119894 ] and C1015840 corresponds to the feature 997888rarrp 1198941015840 =[1205722p1119894 (1minus1205722)p2119894 ] Finally we use proposed BBC algorithm tobuild an index tree I from 997888rarrp 119894 997888rarrp 1198941015840 C C1015840 and employ ASPEto encrypt the plaintext tree as a ciphertext tree T

To further enhance query privacy we reduce the heightof the index tree T and return multiple encrypted images tothe query user at the one time Specifically we set the internalnode to be a new leaf node Then the new leaf node containsall the encrypted images in the subtree that the internal nodeof T is a root node Due to the fact that the index tree isbalanced leaf nodes of reduced index tree can always pointto multiple encrypted images Please note that some imagesin a subtree are similar which is a compromise with effectiveretrieval

34 Security Analysis We analyze SBBT concerning theprivacy requirements in the design goals

(1) The database and its index are confidential In SBBTindex vectors k of BBT are encrypted as V by the high-dimensional randommatrix which is kept confidential to the

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

8 Security and Communication Networks

2 3 4 5 6 7 8 9 101 of Category

005

01

015

02

025

03

035

04

Aver

age C

orre

latio

n of

HSV

0

001

002

003

004

095

Aver

age C

orre

latio

n of

DCT

2 3 4 5 6 7 8 9 101 of Category

corrminusintracorrminusintercorrminusdiff

corrminusintracorrminusintercorrminusdiff

Figure 5 The average correlation of HSV histogram (left) and DCT histogram (right) under same category and different categories

cloud For images in the database the partial encryption canprotect their confidentiality [23 25]

(2) Since the feature vector 997888rarrq of the query image isextended by random numbers 120574 and 120579 it can generatedifferent ciphertext queries Q The cloud cannot identify thesame query image from its different ciphertext queries Thequery pattern is protected

(3) According to different 120573 different features of samequery image can retrieve different images When 120573 is equal to1205721 or 1205722 different results can be recovered to the same queryimage Thus the retrieved result and path of the same queryimage are not unique but the retrieval is still effective Thecloud cannot link the different returned image to the samequery image The access pattern is protected

(4) In addition since the height of SBBT is reduced oneleaf node will point to multiple encrypted images It furtherbreaks the one-to-one relationship between the query imageand the return imagesThe access pattern is further protected

In short the SBBT is secure in the known ciphertextmodel and can resist the statistical attack in the knownbackground model

4 Performance Analysis

In this section we show the experimental results andefficiency analysis of the proposed scheme As we knowthat JPEG is the most commonly used image format andaccounting for up to 95 of images on the web [34] sowe adopt the Corel database [10] for the similarity searchThe database contains 1000 JPEG images which belong to 10categories African Beach Architecture Buses DinosaursElephants Flowers Horses Mountain and Food Each ofthem includes 100 images As [5] we split an image into 256blocks and extract hue saturation and value of brightness

of each block Then we perform hierarchical clustering ofblock features to construct the ldquobag of wordrdquo model Theusage frequency of the visual word is the HSV histogramof the image As [22] DCAC coefficients of image blocksare quantized into different bins and the DCT histogram isthe usage frequency of quantized DCAC coefficients Thusthe color feature p1119894 and texture feature p2119894 are set as the991-dimensional HSV histogram and 780-dimensional DCThistogram respectively

41 Parameters Setting First of all we should set parameters1205721 of 997888rarrp 119894 and 1205722 of 997888rarrp 1198941015840 to construct an index tree Becausethe integrated features compose of HSV histograms and DCThistogram we first evaluate average histogram similarity ofthe images in the same category and different categories(denoted as corr-intra and corr-inter respectively) as shownin Figure 5 We can see that the value of corr-intra is largerthan corr-inter thus images of different categories can bedistinguished by each histogram According to the differencebetween corr-intra and corr-inter (ie corr-diff in Figure 5)we set 1205721 = 01 for the integrated feature 997888rarrp 119894 of the databaseC The purpose is to make the HSV histogram and DCThistogram have the same contribution to the search results

For the parameter 1205722 of 997888rarrp 1198941015840 we hope that the index treeof database C C1015840 can better protect the access pattern Weuse two different metrics to measure the privacy protectionachieved by different 1205722 The two metrics are the number ofdifferent image categories in each subtree and the categoryentropy of each subtree Figure 6 shows the average numberof image categories contained in the subtree of differentlevels Because the index tree of the database based ondifferent1205722 has same height of 11 (ie lceillog20002 rceil) the proposedBBC algorithm performs well However we can see from

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 9

Table 1 Average category entropy of each level

Parameter Level of the Index Tree1205722 1 2 3 4 5 6 7 8 9 10 1102 299 249 21 175 138 107 081 062 044 021 003 32 282 207 181 143 118 091 07 048 024 004 332 288 228 18 153 128 102 078 054 028 005 332 303 231 183 156 127 103 079 055 03 006 332 291 237 196 165 141 111 089 061 032 007 332 26 21 171 146 124 103 081 057 029 008 332 292 232 185 152 126 104 081 057 03 009 332 292 225 178 149 127 103 083 058 03 0

0

2

4

6

8

10

Aver

age N

umbe

r of C

ateg

orie

s

2 3 4 5 6 7 8 9 10 111Level of Index Tree

2 = 02

2 = 03

2 = 04

2 = 05

2 = 06

2 = 07

2 = 08

2 = 09

Figure 6 The average number of image categories

Figure 6 that the subtree near the leaf node contains a similarnumber of image categories We will use category entropy tomeasure the privacy The category entropy which is definedas minussum119894(119888119894sum119894 119888119894) log2(119888119894sum119894 119888119894) (119888119894 is the number of existing119894 category) is used to measure the randomness of imagecategories in the subtree From Table 1 we know that theindex tree of 1205722 = 06 has the highest average entropy at the3th-10th level even its average category number at 3th-6thlevel is not the biggest in Figure 6 Thus we set the weightedfactor 1205722 = 06 for the integrated feature 997888rarrp 1198941015840 of database C1015840

42 Index Tree Construction In order to prove that ouralgorithm can classify feature vectors in a balanced mannerwe use average similarity of the cluster size to comparethe clustering performance of proposed BBC algorithmbalanced k-means (ie Bkmeans) k-means++ and FSCLThe average similarity based on (5) is defined as 119878 = 119904(120591(119905+1)

119896)

BBCBkmeans

Kmeans++FSCL

0

005

07

075

08

085

09

095

1

Sim

ilarit

y of

Clu

ster S

ize (

S )

5 10 150Level of Index Tree

Figure 7 The average size similarity of two clusters

FromFigure 7 we can see that the index trees generated by theproposed BBC algorithm Bkmeans k-means++ and FSCLhave the height of 11 12 13 and 16 respectively The smallerthe height of the index tree indicates that the proposedalgorithm can classify feature vectors into two clusters moreevenly at each level of the index tree Figure 7 shows thatthe cluster size of FSCL is the most unbalanced at 1th-11thlevels There are two reasons the first is that the change ofthe weight factor affects the stability of the clustering resultand the second is that the value of weight factor impactsthe distance between the cluster center and other featurevectors On contrast due to the use of conscience weightfactor the proposed algorithmperforms better than Bkmeansand k-means++ and the result is much better than FSCLThe reason is that the bipartite graph matching mechanismconstrains a balance of the two clusters and restricts theinfluence of weight factor In short the weight factor is stilleffective in improve the balance of clustering In fact aboveresult is also similar in the database of different 1205721 1205722

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

10 Security and Communication Networks

Table 2 Minimal Dunn index of each level

Algorithm Level of the Index Tree1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BBC 047 001 001 002 001 001 001 001 001 002 infin - - - - -Bkmeans 047 001 002 001 002 002 002 003 004 024 073 infin - - - -Kmeans++ 021 001 001 001 001 001 001 001 002 003 022 105 infin - - -FSCL 047 004 005 006 007 01 015 019 023 038 028 058 062 066 095 infin

Table 3 Time cost of index tree construction

Algorithm BBC Bkmeans Kmeans++[20] FSCLTime (s) 111 2316 1749 254

In further we employ Dunn index (DI) [35] to evaluatementioned clustering algorithms The Dunn index is definedas

119863119868 = minforall119909isin1198621198711forall119910isin1198621198712 119863 ⟨119909 119910⟩max119896=12 maxforall119909119910isin119862119871119896 119863 ⟨119909 119910⟩ (12)

where the numerator is the minimal distance of vectors indifferent clusters and the denominator is the largest within-cluster distance Thus we can know that when separatingtwo points into two clusters the denominator is zero andDI is infinite A high value of DI means good compactnessTable 2 shows the minimal DI of clustering results in eachlevel In the same level of the index tree the most of BBCrsquos DIis smaller than other algorithms (ie Bkmeans K-means++and FSCL) Meanwhile BBCrsquos DI of the last level is muchsmaller than other algorithms Obviously for the proposedBBC algorithm the cluster-size similarity is with higherpriority than clustering compactness

The above algorithms are implemented using MATLABon a PC with an Intel Core i5 32 GHz CPU and 16 GB RAMTime consumption for index tree construction is shown inTable 3 In particular we execute 30 times clustering withrandom initial centers and select the result of highest scoreof cluster-size similarity Bkmeans is most time consumingbecause it is not designed to find a vector that splits a datasetevenly In contrast proposed BBC costs the least time

43 Retrieval Performance To evaluate the retrieval perfor-mance each image in the Corel database is selected as a queryimage The retrieval performance is evaluated by averageprecision-recall (PR) curve ie

119875119903119890119888119894119904119894119900119899 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119903119890119905119906119903119899119890119889 119894119898119886119892119890119904

119877119890119888119886119897119897 = 119900119891 119903119890119905119906119903119899119890119889 119901119900119904119894119905119894V119890 119894119898119886119892119890119904 119900119891 119886119897119897 119901119900119904i119905119894V119890 119894119898119886119892119890119904

(13)

Figure 8(a) compares the PR curves of the query feature basedon different 120573 values It can be seen that the performanceof different 120573 achieves high precision when recall belongsto (0 005) Besides when only one image is returned ierecall=1 the precision of 120573 = 01 or 06 is 100The reason

is thatwe set120573 as1205721 or1205722 Although two integrated features ofa query image are different the retrieved results are differentencrypted versions of the same plaintext image Since thequery features the corresponding results and search pathsare different the statistical information of query image isprotected

Figure 8(b) shows the retrieval performance of thereduced index trees with different heights It can be seen thatthe precision increases with the height of index tree even if 120573is set as 0 or 1 Further Figure 9 shows the average maximumheight of a reduced index tree inwhich the returned leaf nodecontains the exact query image It can be seen that the heightof reduced index tree is 11 when 120573 is set as 01 or 06 If 120573 is setcloser to 01 or 06 the height of reduced index tree is nearerto 11 When the height is 9 the most of exact query images areretrievable in the reduced index tree and one search returnsfour (22) images Fortunately as shown in Figure 8(b) theprecision is about 70 at the 9th level of the index tree Thusthe retrieval is still effective

44 Privacy Protection We consider the privacy protectionfrom two aspects query pattern and access pattern Since thequery feature is an integrated histogram ie 997888rarrq = [120573q1 (1 minus120573)q2] Different 120573 correspond to different query features Inaddition even if 120573 is determined the feature will be extendedby random numbers 120579 120574 as 997888rarrq = 120579[120574 997888rarrq ] Thus the encryptedfeature is Q = M(120579[120574 120573q1 (1 minus 120573)q2])T Obviously theencrypted feature of a query image is not unique The querypattern of each query image is protected

To protect access pattern the retrieved result of a queryimage should be not uniqueWehave built an index tree basedon features of original database (1205721 = 01) and features of thecopied database (1205722 = 06) Then the integrated features ofdifferent 120573 are used to retrieve in the index tree We know thefact that the returned result is a leaf node of the index treeand different leaf nodes indicate different access paths So wedefine a path privacy to evaluate the protection level of accesspattern

119875119903119894V119886119888119910 = 119900119891 119889119894119891119891119890119903119890119899119905 119904119890119886119903119888ℎ 119901119886119905ℎ119904 119900119891 119886119897119897 119904119890119886119903119888ℎ 119901119886119905ℎ119904 (14)

Different search path can be obtained by using the queryfeature of different 120573 Figures 10(a) and 10(b) show the path

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 11

=0=0025=005=0075=01=03

=035=04=06=08=1

40

50

60

70

80

90

100Pr

ecisi

on (

)

5 10 15 20 25 300Recall ()

(a)

0 2 4 6 8 10 120

20

40

60

80

100

Height of Index Tree

Prec

ision

()

=0=0025=005=0075=01=03

=035=04=06=08=1

(b)

Figure 8 (a) Precision-recall curves of original index tree (b) Retrieval precision of reduced index trees

0 0025 005 0075 01 03 035 04 06 08 1

Parameter

8

9

10

11

12

Hei

ght o

f Red

uced

Inde

x Tr

ee

Figure 9 Under different 120573 the average maximum height of areduced index tree that the query image can retrieve itself

privacy of different 120573 where 120573 = 01 and 06 are baselinesrespectively It can be seen that the path privacy is 100when120573 is set as 06 in Figure 10(a) and 01 in Figure 10(b) Thereason is that retrieval is separately carried out in the left orright subtree of the root node Figure 10(a) also shows thatsearch paths are in the same subtree of the root node when120573 is set to 06 08 or 1 But the search path of 120573 = 08 or1 is still different with 120573 = 06 as shown in Figure 10(b)Similarly when 120573 is set to 0 00025 005 0075 01 03 or035 the search path is in the same subtree of the root nodebut still different Besides when 120573 is set to 04 the searchtraverses in both subtrees of the root node In addition thepath privacy increases with the height of the tree In short the

access pattern can be protected by setting different 120573 Pleasenote that usage frequency of different 120573 can be optimizedbased on the frequency distribution of the query images

In summary Table 4 compares the proposed scheme withother schemes in the terms of search efficiency query patternprotection and access pattern protection We compare thesearch efficiency of different schemes by theoretical analysisinstead of experiments The reason is that encrypted imageretrieval schemes are based on different image databasesaccording to the application Also some schemes utilizeencrypted image feature but other schemes employ the hashcode of the image feature

In the schemes under vector space model such as Yuan[20] a skew index tree leads to119874(log119873 sim 119873) search time Xia[19] uses local-sensitive hash (LSH) to construct a prefiltertable Then with corresponding image feature the cloudcarries out a refinement of candidate results Thus it is asublinear scheme Cheng [22 23] does not build an indextree so the retrieval efficiency is 119874(119873) Weng [6 7] usesa hash table to improve the search efficiency (ie 119874(1 sim119873)) In short the index tree or hash table is a trade-offbetween space and efficiency Although Wengrsquos scheme [7]and proposed scheme both protect query pattern and accesspattern the hash code and feature vector are respectivelyused SpecificallyWeng [7] omits certain bits of the hash codeof query image and returns multiple candidates to the userfor refinement The user is involved in completing retrievaland obtaining the accurate result In our scheme the usercan set 120573 as 1205721 or 1205722 to retrieve exact query image throughdifferent paths Unlike [7] our scheme does not need the userto participate in the feature comparison However to protect

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

12 Security and Communication Networks

0

20

40

60

80

100Pa

th P

rivac

y (

)

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1lowast

(a)

0

20

40

60

80

100

Path

Priv

acy

()

2 4 6 8 10 120Level of Index Tree

=0=0025=005=0075=01=03

=035=04=06=08=1

lowast

(b)

Figure 10 Path privacy of different 120573 (a) 120573 = 01 is the baseline and (b) 120573 = 06 is the baseline

Table 4 Comparison of encrypted image retrieval schemes

Goals Yuan [20] Xia [19] Weng [6] Weng [7] Cheng [22 23] ProposedEfficiency 119874(log119873 sim 119873) 119874(1 sim 119873) 119874(119873) 119874(log2119873)Query Pattern radic radic times radic times radicAccess Pattern times times times radic times radic

access pattern and alleviate computation burden on the userwe copy the image database and cost more storage space inthe cloud Fortunately the retrieval efficiency is 119874(log 2119873)5 Conclusion

This paper introduces a secure and efficient image retrievalscheme over encrypted cloud data In this scheme wepropose a novel clustering algorithm ie BBC to build abalanced index tree namely BBT Thus our scheme canachieve logarithmic search time Firstly to support effectiveimage retrieval in the ciphertext we employ ASPE to encryptthe index of the image database and feature of the queryimage In this case the proposed SBBT is secure in the knownciphertext model Secondly to resist the statistical attack onknown background model we copy the database and build anew SBBT of the original database and the copied databaseAfter that adjusting the weighted factor of integrated featurethe search results of a query image are different leaf nodesFinally through merging the subtree of SBBT the leaf nodeof new SBBT contains multiple encrypted images Thereforethe one-to-one relationship between the query image and the

return image is broken In short the SBBT can protect querypattern and access pattern Furthermore proposed BBCand SBBT can be used as independent tools for encrypteddocument retrieval

In future we will improve the ASPE to support multiuserscenario where a dishonest user may try to reveal theencrypted query of other users It is also a meaningful workto generate a generic high-semantic feature

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (U1636206 61602295 61525203 and61472235) the Shanghai Dawn Scholar Plan (14SG36) theShanghai Excellent Academic Leader Plan (16XD1401200)the Natural Science Foundation of Fujian Province(2017J01502) and the Scientific Research Foundation ofFuzhou University (510483)

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

Security and Communication Networks 13

References

[1] K Ren C Wang and Q Wang ldquoSecurity challenges for thepublic cloudrdquo IEEE Internet Computing vol 16 no 1 pp 69ndash73 2012

[2] C Gentry ldquoFully homomorphic encryption using ideal latticesrdquoin Proceedings of the 41st annual ACM symposium on Theory ofComputing (STOC rsquo09) pp 169ndash178 ACM Bethesda Md USA2009

[3] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[4] K Li W Zhang C Yang and N Yu ldquoSecurity Analysison One-to-Many Order Preserving Encryption-Based CloudData Searchrdquo IEEE Transactions on Information Forensics andSecurity vol 10 no 9 pp 1918ndash1926 2015

[5] W Lu A Swaminathan A L Varna M Wu et al ldquoEnablingsearch over encrypted multimedia databasesrdquo Media Forensicsand Security vol 7254 2009

[6] L Weng L Amsaleg A Morton and S Marchand-MailletldquoA privacy-preserving framework for large-scale content-basedinformation retrievalrdquo IEEE Transactions on Information Foren-sics and Security vol 10 no 1 pp 152ndash167 2015

[7] L Weng L Amsaleg and T Furon ldquoPrivacy-preserving out-sourced media searchrdquo IEEE Transactions on Knowledge andData Engineering vol 28 no 10 pp 2738ndash2751 2016

[8] W K Wong D W Cheung B Kao and N Mamoulis ldquoSecurekNN computation on encrypted databasesrdquo in Proceedings ofthe International Conference on Management of Data and 28thSymposium on Principles of Database Systems (SIGMOD-PODSrsquo09) pp 139ndash152 Providence RI USA July 2009

[9] P Williams R Sion and B Carbunar ldquoBuilding castles outof mud practical access pattern privacy and correctness onuntrusted storagerdquo inProceedings of the 15thACMConference onComputer and Communications Security CCSrsquo08 pp 139ndash148October 2008

[10] httpwangistpsuedudocsrelated[11] D X Song D Wagner and A Perrig ldquoPractical techniques for

searches on encrypted datardquo in Security and Privacy 2000 SampP2000 Proceedings 2000 IEEE Symposium on IEEE pp 44ndash55IEEE 2000

[12] D Boneh G Di Crescenzo R Ostrovsky and G PersianoldquoPublic key encryption with keyword searchrdquo in Advances incryptologymdashEUROCRYPT 2004 vol 3027 of Lecture Notes inComput Sci pp 506ndash522 Springer Berlin 2004

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo IEEETransactions onParallel andDistributed Systems vol25 no 1 pp 222ndash233 2014

[14] W Sun B Wang N Cao et al ldquoVerifiable privacy-preservingmulti-keyword text search in the cloud supporting similarity-based rankingrdquo IEEE Transactions on Parallel and DistributedSystems vol 25 no 11 pp 3025ndash3035 2014

[15] Z Xia X Wang X Sun and Q Wang ldquoA Secure and DynamicMulti-Keyword Ranked Search Scheme over Encrypted CloudDatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[16] Z Erkin M Franz J Guajardo S Katzenbeisser I Lagendijkand T Toft ldquoPrivacy-Preserving Face Recognitionrdquo in PrivacyEnhancing Technologies vol 5672 of Lecture Notes in Computer

Science pp 235ndash253 Springer Berlin Heidelberg Berlin Hei-delberg 2009

[17] W Lu A L Varna A Swaminathan and M Wu ldquoSecureimage retrieval through feature protectionrdquo inProceedings of theICASSP 2009 - 2009 IEEE International Conference on AcousticsSpeech and Signal Processing pp 1533ndash1536 Taipei TaiwanApril 2009

[18] W Lu A L Varna and M Wu ldquoConfidentiality-preservingimage search A comparative study between homomorphicencryption and distance-preserving randomizationrdquo IEEEAccess vol 2 pp 125ndash141 2014

[19] Z Xia N N Xiong A V Vasilakos and X Sun ldquoEPCBIR Anefficient and privacy-preserving content-based image retrievalscheme in cloud computingrdquo Information Sciences vol 387 pp195ndash204 2017

[20] J Yuan S Yu and L Guo ldquoSEISA Secure and efficientencrypted image search with access controlrdquo in Proceedingsof the IEEE INFOCOM 2015 - IEEE Conference on ComputerCommunications pp 2083ndash2091 Kowloon Hong Kong April2015

[21] X Zhang and H Cheng ldquoHistogram-based retrieval forencrypted jpeg imagesrdquo in Proceedings of the in Signal andInformation Processing (ChinaSIP 2014 IEEE China SummitInternational Conference on IEEE pp 446ndash449 2014

[22] HCheng X Zhang and J Yu ldquoAC-coefficienthistogram-basedretrieval for encrypted JPEG imagesrdquo Multimedia Tools andApplications vol 75 no 21 pp 13791ndash13803 2016

[23] H Cheng X Zhang J Yu and Y Zhang ldquoEncrypted JPEGimage retrieval using block-wise feature comparisonrdquo Journalof Visual Communication and Image Representation vol 40 pp111ndash117 2016

[24] Y Xu J Gong L Xiong Z Xu J Wang and Y Q Shi ldquoAprivacy-preserving content-based image retrieval method incloud environmentrdquo Journal of Visual Communication ImageRepresentation vol 43 no C pp 164ndash172 2017

[25] Z Qian H Zhou X Zhang and W Zhang ldquoSeparablereversible data hiding in encrypted jpeg bitstreamsrdquo IEEETransactions on Dependable Secure Computing no 99 pp 1-12016

[26] Z Qian H Xu X Luo and X Zhang ldquoframework of reversibledata hiding in encrypted jpeg bitstreamsrdquo IEEE Transactions onCircuits Systems for Video Technology no 99 pp 1-1 2018

[27] A Banerjee and J Ghosh ldquoFrequency-sensitive competitivelearning for scalable balanced clustering on high-dimensionalhyperspheresrdquo IEEE Transactions on Neural Networks andLearning Systems vol 15 no 3 pp 702ndash719 2004

[28] A S Galanopoulos R L Moses and S C Ahalt ldquoDiffusionapproximation of frequency sensitive competitive learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 8 no 5 pp 1026ndash1030 1997

[29] C T Althoff A Ulges and A Dengel ldquoBalanced clustering forcontent-based image browsingrdquo in in GI-Informatiktage 2011GI-Informatiktage (Informatik-2011) Gesellschaft fr InformatikBonn Germany

[30] M I Malinen and P Franti ldquoBalancedK-means for clusteringrdquoin Structural Syntactic and Statistical Pattern Recognition PFrantiG BrownM Loog F Escolano andM Pelillo Eds vol8621 of Lecture Notes in Computer Science pp 32ndash41 SpringerBerlin Germany 2014

[31] httpsenwikipediaorgwikiAssignment problem

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

14 Security and Communication Networks

[32] R Jonker and A Volgenant ldquoA shortest augmenting pathalgorithm for dense and sparse linear assignment problemsrdquoComputing Archives for Scientific Computing vol 38 no 4 pp325ndash340 1987

[33] W Jones A Chawdhary and A King ldquoOptimising theVolgenantndashJonker algorithm for approximating graph edit dis-tancerdquo Pattern Recognition Letters vol 87 pp 47ndash54 2017

[34] G Schaefer ldquoFast JPEG compressed domain image retrievalrdquoin Proceedings of the 5th International Conference onMultimediaComputing and Systems ICMCS 2016 pp 148ndash150marOctober2016

[35] httpsenwikipediaorgwikiDunn index

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: Secure and Efficient Image Retrieval over Encrypted Cloud Datadownloads.hindawi.com/journals/scn/2018/7915393.pdf · SecurityandCommunicationNetworks Inspiredbybalancedk-means,werstturn()

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom