[IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu,...

4
A Query Conversion Scheme for Encrypted Cloud Databases Hequn Xian, Jing Li and Xiuqing Lu College of Information Engineering, Qingdao University Qingdao, China Institute of Information Engineering, Chinese Academy of Sciences Beijing, China [email protected] Abstract—Encryption is an effective way of protecting sensitive data that have been outsourced in the cloud computing environment. But it is computationally expensive for the database management system to carry out SQL queries against the encrypted data. A query conversion scheme is presented which converts traditional database queries to encryption specific versions so that the queries can be directly executed on the cloud server. This scheme dramatically increases the efficiency of query processing in encrypted cloud databases, and it provides high transparency for end users by introducing a SQL gateway to the user applications and the cloud server. Keywords- query conversion; cloud databases; SQL gateway I. INTRODUCTION The original form of cloud database was introduced in [1], known as the database as a service (DAS) model. It was also defined as the outsourced database model in other works [2], which preceded the concept of cloud computing. No matter in the cloud or not, database outsourcing gives us promising advantage in cost, scalability and availability [3]. Cloud database services are available on the Internet such as Google Cloud SQL and Oracle Cloud data. However, security issues always give us a second though when we decide to outsource our data to a cloud server. To protect sensitive data outsourced in the cloud, encryption techniques have been proven feasible and reliable, given that algorithms were properly chosen and that keys were safely protected. When data encryption is adopted, traditional SQL queries cannot be efficiently if not directly executed, because the SQL predicates are in plaintext and database indexes are rendered useless by encryption [4]. In this paper, we present a query conversion scheme, which introduce a SQL gateway to the cloud user applications and the server. Traditional SQL queries are converted to encryption specific versions without being aware by the client applications. The converted queries can be directly executed on the cloud server, then encrypted query results are sent to the SQL gateway, decrypted and fed back to the client applications. II. RELATED WORKS Various security issues have been addressed in the cloud database scenario [5], the problem of access control and trustiness is discussed in [6]. A resource configuration driven by cost-performance is introduced in [7]. Hakan et al. devise a query rewriting mechanism for encrypted databases [4] and discuss its key updating problem in [8]. An encrypted index method is introduced in [9] which uses the encrypted data to build the index on the server. Another attempt to solve the problem of executing query on encrypted data is the order preserving method introduced in [10], which adopts a non-standard algorithm to encrypt numeric data so that the order of the numbers remains the same in cipher. Secret sharing algorithms can be used for privacy preserving for outsourced data [11]. Attribute-based encryption for database outsourcing is introduced in [12]. Other security techniques in cloud databases includes the access control mechanisms based on data sharing schemes[13], fine-grained private search and decryption [14] and multi-keyword search over encrypted cloud data with ranking [15]. III. SQL GATEWAY We assume that the database is relational and the tables are fine grain encrypted. Fig. 1 is an example of encrypted tables. recID Name SSN Address 800001 200005 300006 Tifeny Meya Jacab 01586557 17290000 85953165 St.Lous Str… Lincon avenue … VT Sanjonse… The user’s perspective of the relation R empkey Name SSN Address 800001 200005 300006 Tifeny Meya Jacab t*#&¡@_ (¡+EF022 UZ?G#& St.Lous Str… Lincon avenue … VT Sanjonse… The actual storage of the relation R on the server Figure 1. An example of attribute grain database encryption. When the user application submit a query of select * from R where SSN= 01586557’, the actual query that the server executed should be select * from R where SSN = ‘t*#&¡@_’. This conversion is carried out by our SQL gateway, which is deployed on the client side. Fig. 2 shows the system structure QRY is the original query from the user application, which is converted into QRY E and QRY C . RST E is the result of executing QRY E , which is then decrypted and queried by 2013 International Conference on Information Technology and Applications 978-1-4799-2876-7/13 $31.00 © 2013 IEEE DOI 10.1109/ITA.2013.40 147

Transcript of [IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu,...

Page 1: [IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu, China (2013.11.16-2013.11.17)] 2013 International Conference on Information Technology

A Query Conversion Scheme for Encrypted Cloud Databases

Hequn Xian, Jing Li and Xiuqing Lu College of Information Engineering, Qingdao University

Qingdao, China Institute of Information Engineering, Chinese Academy of Sciences

Beijing, China [email protected]

Abstract—Encryption is an effective way of protecting sensitive data that have been outsourced in the cloud computing environment. But it is computationally expensive for the database management system to carry out SQL queries against the encrypted data. A query conversion scheme is presented which converts traditional database queries to encryption specific versions so that the queries can be directly executed on the cloud server. This scheme dramatically increases the efficiency of query processing in encrypted cloud databases, and it provides high transparency for end users by introducing a SQL gateway to the user applications and the cloud server.

Keywords- query conversion; cloud databases; SQL gateway

I. INTRODUCTION The original form of cloud database was introduced in

[1], known as the database as a service (DAS) model. It was also defined as the outsourced database model in other works [2], which preceded the concept of cloud computing. No matter in the cloud or not, database outsourcing gives us promising advantage in cost, scalability and availability [3]. Cloud database services are available on the Internet such as Google Cloud SQL and Oracle Cloud data. However, security issues always give us a second though when we decide to outsource our data to a cloud server.

To protect sensitive data outsourced in the cloud, encryption techniques have been proven feasible and reliable, given that algorithms were properly chosen and that keys were safely protected. When data encryption is adopted, traditional SQL queries cannot be efficiently if not directly executed, because the SQL predicates are in plaintext and database indexes are rendered useless by encryption [4]. In this paper, we present a query conversion scheme, which introduce a SQL gateway to the cloud user applications and the server. Traditional SQL queries are converted to encryption specific versions without being aware by the client applications. The converted queries can be directly executed on the cloud server, then encrypted query results are sent to the SQL gateway, decrypted and fed back to the client applications.

II. RELATED WORKS Various security issues have been addressed in the cloud

database scenario [5], the problem of access control and trustiness is discussed in [6]. A resource configuration driven by cost-performance is introduced in [7]. Hakan et al.

devise a query rewriting mechanism for encrypted databases [4] and discuss its key updating problem in [8]. An encrypted index method is introduced in [9] which uses the encrypted data to build the index on the server. Another attempt to solve the problem of executing query on encrypted data is the order preserving method introduced in [10], which adopts a non-standard algorithm to encrypt numeric data so that the order of the numbers remains the same in cipher. Secret sharing algorithms can be used for privacy preserving for outsourced data [11]. Attribute-based encryption for database outsourcing is introduced in [12]. Other security techniques in cloud databases includes the access control mechanisms based on data sharing schemes[13], fine-grained private search and decryption [14] and multi-keyword search over encrypted cloud data with ranking [15].

III. SQL GATEWAY We assume that the database is relational and the tables

are fine grain encrypted. Fig. 1 is an example of encrypted tables.

recID Name SSN Address

800001200005300006

TifenyMeyaJacab

015865571729000085953165

St.Lous Str…Lincon avenue …VT Sanjonse…

The user’s perspective of the relation R

empkey Name SSN Address

800001200005300006

TifenyMeyaJacab

t*#&¡@_(¡+EF022UZ?G#&

St.Lous Str…Lincon avenue …VT Sanjonse…

The actual storage of the relation R on the server

Figure 1. An example of attribute grain database encryption.

When the user application submit a query of select * from R where SSN= ‘01586557’, the actual query that the server executed should be select * from R where SSN = ‘t*#&¡@_’. This conversion is carried out by our SQL gateway, which is deployed on the client side. Fig. 2 shows the system structure

QRY is the original query from the user application, which is converted into QRYE and QRYC. RSTE is the result of executing QRYE, which is then decrypted and queried by

2013 International Conference on Information Technology and Applications

978-1-4799-2876-7/13 $31.00 © 2013 IEEE

DOI 10.1109/ITA.2013.40

147

Page 2: [IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu, China (2013.11.16-2013.11.17)] 2013 International Conference on Information Technology

QRYC. RST represents the final query result that is fed back to the user application. As encrypted data can only be re-queried after decryption, RST is usually a subset of RSTE. Our goal is to minimize the size of RSTE, so that the network traffic and the decryption operation time can be optimized.

Figure 2. An example of attribute grain database encryption.

A. Definitions FS: Collection of all the attributes in the database

schema. EFS: Collection of all the encrypted attributes in the

database schema, a subset of FS. QRY = <sList, cond, tbls> constitutes a select SQL

query, in which FSsList ⊆ , cond is the set of predicates in the where clause, tbls represents the relations that are involved in the query.

>=< constOPfldSTK ,, is a simple comparison predicate between an attribute and a const number.

>=< 2,,1 fldOPfldDTK is a comparison predicate between two attributes.

>=< QRY,OUTERFLDSETK is a predicate containing a EXISTS sub-query,OUTERFLDS comprises the set of attributes involved by the outer query of QRY.

>=< QRYOUTERFLDSInFieldITK ,, is a predicate containing a IN sub-query. The grammar should be InField IN (QRY), OUTERFLDS comprises the set of attributes involved by the outer query of QRY.

},,,,,{ likeOP ≥>=<≤∈ is a comparison operator.

ITKETKSTKDTKcondcondcondcondcondcond

||||||:: ¬∨∧=

In our scheme, an IN or EXISTS sub-query is treated as a predicate first, and then decomposed as a select query. With the above definitions ready, we define the decision function which returns the fact that whether a combined predicate contains encrypted attributes or not as follows:

},{):( falsetrueCondchasEFld →

( )"".::,.,.,

)( =≠∧=���

∉∈

= OPcTKncEFSfldcfalseEFSfldctrue

chasEFld

"".::,)( ==∧== OPcSTKcfalsechasEFld

( )"".::,2.1.,2.1.,

)(

=≠∧=���

∉∧∉∈∨∈

=

OPcDTKcEFSfldcEFSfldcfalseEFSfldcEFSfldctrue

chasEFld

( )"".::

,

)2.1.()2.1.(,

)2.1.()2.1.(,

)(

==∧=

���

���

∈∧∈∨∉∧∉

∈∧∉∨∉∧∈

=

OPcDTKcEFSfldcEFSfldc

EFSfldcEFSfldcfalseEFSfldcEFSfldc

EFSfldcEFSfldctrue

chasEFld

( )( )

( ))21::()21::(,::2,::1,::

,)2()1()2,1(,

)2()1()2,1(,)(

cccccccondccondccondcchasEFldchasEFldccfalse

chasEFldchasEFldcctruechasEFld

∨=∨∧====���

¬∧¬∀∨∃

=

ETKccondQRYchasEFldchasEFld == ::),..()(

ITKccondQRYchasEFldEFSsListQRYcEFSInfieldc

EFSsListQRYcEFSInfieldcchasEFld

=∨=∧∈∨

≠∧∉=

::),...()....(

)....()(φ

φ�

Then we define another function which returns a collection of attributes that a predicate involves as follows:

( ) FScondcinFld 2: →

���

���

==

==

=

DTKcfldcfldcSTKcfldc

ETKcOUTERFLDScITKc

OUTERFLDScInfieldc

cinFld

::},2.,1.{::},.{

::,.::

,.}.{

)(

( ))21::()21::(,::2,::1,::),2()1()(

cccccccondccondccondccinFldcinFldcinFld

∨=∨∧===== �

B. Algorithm Convert1 We use the notion {Con}k to represent the encryption

result of const Con with key k. The input of the algorithm is a query QRY, the output are query QRY E for the server and QRY C for the client SQL gateway.

Convert QRY.cond into a conjunctive normal form

WiWn

==

1

//initialize non encryption flag nefg = TRUE; FOR(i=1;i<n+1;i++) IF (!hasEFld(Wi))

Winefgnefg Λ=: //encrypt equality comparison operand

EFS)ldc.""c.OPSTK::| containing item econjunctiv some is there( WHILE

∈∧==∧= fcc c.cond={c.cond}k; // initialize encryption flag efg =TRUE; FOR (i=1;i<n+1;i++) IF (hasEFld(Wi) )

Wiefgefg Λ=: ; QRY E.sList = inFld(efg) QRY.sList; QRY E.cond= nefg ; QRY C.sList = QRY.sList ; QRY C.cond = efg ;

SQL gateway

Conversion Encryption decryption

QRY QRYE

RSTE RST QRYC

Cloud Database

Server

148

Page 3: [IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu, China (2013.11.16-2013.11.17)] 2013 International Conference on Information Technology

In case there is no encrypted predicate to optimize, QRY E will be the same as QRY, and the user query will be directly handled by the cloud server.

C. Algorithm Convert2 The input of the algorithm Convert2 is one of the output

query QRY C from algorithm Convert1, Convert2 then outputs the altered QRY C.

Breadth-first search for each predicate ci in sub-query QRYC.cond and QRY C.cond, if ci::=ETK ci::=ITK Convert ci.QRY.cond into a conjunctive normal form

WiWn

==

1

//initialize non encryption flag nefg = TRUE; FOR(i=1;i<n+1;i++) IF( Wi does not contain any predicate that involves attribute from the outer query)

Winefgnefg Λ=: ; // initialize encryption flag efg :=TRUE; FOR(i=1;i<n+1;i++) IF ( Wi does contain some predicate that involves attribute from the outer query)

Wiefgefg Λ=: ; Generate new query statement Qi� for ci; Qi�.cond = nefg ; Qi�.sList = ci.QRY.sLists inFld efg - ci.OUTERFLDS ; Apply algorithm Convert1, Convert2 and Convert3 to Qi� as an ordinary query Make a temporary client side data table TCi.QRY corresponding to ci; ci.QRY.tbls = { TCi.QRY } ; ci.QRY.cond = efg ; Output QRY C ;

D. Algorithm Convert3 The input contains query QRY, QRYC from algorithm

Convert2 and the query result RST of QRY E from Algorithm Convert1. Its output RSTC is the final result for the user application.

fc = inFld QRY C. cond EFS ; IF (fc ) { Decrypt all the attributes in RST Apply query QRY C to RST to get result RSTC } ELSE RSTC = RST; IF (QRY.sList EFS - fc!= ) Decrypt attributes of QRY.sList EFS –fc in RSTC Output RSTC;

IV. PERFORMANCE EVALUATION We setup a simplified version of cloud database

environment with 1Gbps LAN network connection between the cloud server and the client users. We populate the database with approximately 3GB data using the DBGEN program from the TPC-H website [16].

We define the average query execution time as the time span from the sending of client query to the receiving of the final result for user application. We define the proportion of encrypted data in table R as ( ) ��= R

SR AAR /η , in which

� SRA is the storage space of encrypted attributes and

� RA is the total data volume. We define � to be the mathematic average of all tables in the database. Different encryption schemas are chosen with different � values. We compare the average query execution times with that of the tuple level encryption mechanism in [4] and [8], which is configured with 5 ABN (average bucket number) values. The results in Fig. 3 show that when the proportion is lower than 10%, our scheme has a smaller average query execution time than that of the tuple level encryption method. However, the 10% ratio fits quite a few real world applications which have only a small proportion of sensitive data that need protection.

Figure 3. An example of attribute grain database encryption.

V. CONCLUSIONS In this paper, we introduce a SQL gateway to the cloud

database environment. It converts user SQL queries to their encryption specific forms with three algorithms that we proposed. Converted queries can be directly executed by cloud servers against encrypted data, and provide transparency for client applications. Experiments show that our scheme can improve the performance of query processing in encrypted cloud databases and minimize the network traffic for intermediate data transmission.

REFERENCES [1] Hakan Hacigümü�, Bala Iyer, and S Mehrotra. Providing Database as

a Service. In Proc of ICDE, 2002, pp 29 – 38. [2] Feifei Li, Marios Hadjieleftheriou, George Kollios, Leonid Reyzin.

Dynamic authenticated index structures for outsourced databases. In Proc of ACM Management of Data (SIGMOD), 2006, pp 121–132.

[3] Luca Ferretti, Michele Colajanni, Mirco Marchetti. Supporting Security and Consistency for Cloud Database. Lecture Notes in Computer Science Volume 7672, 2012, pp 179-193.

149

Page 4: [IEEE 2013 International Conference on Information Technology and Applications (ITA) - Chengdu, China (2013.11.16-2013.11.17)] 2013 International Conference on Information Technology

[4] Hakan Hacigümü�, Bala Iyer, Chen Li, Sharad Mehrotra. Executing SQL over encrypted data in the database-service-provider model. In Proc of the 2002 ACM SIGMOD 2001, pp 223-234.

[5] Divyakant Agrawal, Amr El Abbadi, Sudipto Das, Aaron J. Elmore. Database Scalability, Elasticity, and Autonomy in the Cloud. Lecture Notes in Computer Science Volume 6587, 2011, pp 2-15.

[6] Jong P. Yoon. Access Control and Trustiness for Resource Management in Cloud Databases. Grid and Cloud Database Management, 2011, pp 109-131.

[7] Shoubin Kong, Yuanping Li, Ling Feng. Cost-Performance Driven Resource Configuration for Database Applications in IaaS Cloud Environments. Cloud Computing and Services Science,2012, pp 111-129.

[8] Hakan Hacigümü� and Sharad Mehrotra, Efficient Key Updates in Encrypted Database Systems. SDM 2005, LNCS 3674, pp 1–15

[9] Bijit Hore, Sharad Mehrotra, Gene Tsudik. A Privacy-Preserving Index for Range Queries. VLDB 2004, pp 720-731.

[10] R Agrawal, J Kierman, R Srikant, and Y Xu. Order preserving encryption for numeric data. In Proc of ACM SIGMOND 2004, pp 563 – 574.

[11] Divyakant Agrawal, Amr El Abbadi, Fatih Emekci, Ahmed Metwally, Shiyuan Wang. Secure Data Management Service on Cloud Computing Infrastructures. Lecture Notes in Business Information Processing Volume 74, 2011, pp 57-80.

[12] Mariana Raykova, Hang Zhao and Steven M. Bellovin. Privacy Enhanced Access Control for Outsourced Data Sharing, Lecture Notes in Computer Science, 2012, Volume 7397, pp 223-238.

[13] Jingwei Li, Chunfu Jia, Jin Li and Xiaofeng Chen, Outsourcing Encryption of Attribute-Based Encryption with MapReduce ,Lecture Notes in Computer Science, 2012, Volume 7618, pp 191-201.

[14] Yanbin Lu, Gene Tsudik. Enhancing Data Privacy in the Cloud. IFIP Advances in Information and Communication Technology Volume 358, 2011, pp 117-132.

[15] Örencik, C., Sava�. Efficient and secure ranked multi-keyword search on encrypted cloud data. In Proc of the 2012 Joint EDBT/ICDT Workshops, pp 186–195.

[16] TPC-H. Benchmark Specification. http://www.tpc.org

150