Security Issues in Querying Encrypted Data

download Security Issues in Querying Encrypted Data

of 28

Transcript of Security Issues in Querying Encrypted Data

  • 7/31/2019 Security Issues in Querying Encrypted Data

    1/28

    Purdue Computer Science Technical Report 04-013

    Security Issues in Querying Encrypted Data

    Murat Kantarcoglu Chris Clifton

    Purdue University

    Department of Computer Sciences

    250 N University St

    West Lafayette, IN 47907-2066 USA+1-765-494-6408, Fax: +1-765-494-0739

    {kanmurat, clifton}@cs.purdue.edu

    March 31, 2004

    Abstract

    There has been considerable interest in querying encrypted data,

    allowing a secure database server model where the server does not

    know data values. This paper shows how results from cryptographyprove the impossibility of developing a server that meets cryptographic-

    style definitions of security and is still efficient enough to be practi-

    cal. The weaker definitions of security supported by previous secure

    database server proposals have the potential to reveal significant in-

    formation. We propose a definition of a secure database server that

    provides probabilistic security guarantees, and sketch how a practi-

    cal system meeting the definition could be built and proven secure.

    The primary goal of this paper is to provide a vision of how research

    in this area should proceed: efficient encrypted database and query

    processing with provable security properties.

    1 Introduction

    There has been considerable interest in the notion of a secure databaseservice: A Database Management System that could that could manage

    1

  • 7/31/2019 Security Issues in Querying Encrypted Data

    2/28

    a database without knowing the contents[11]. While the business model is

    compelling, it is important that such a system be provably secure. Existingproposals have problems in this respect; the security provided leaves roomfor information leaks.

    Any method for database encryption that does not meet rigorous cryptography-based standards security must be used carefully. For example, methods thatquantize or bin values [11] reveal data distributions. Methods that hide dis-tribution, but preserve order [3], can also disclose information if used navely.While they may effectively hide values in isolation, using such techniques onmultiple attributes in a tuple can pose dangers. We provide more discussionof the successes and potential pitfalls of related work in Section 6; we nowgive an example of how nave use of prior proposals can disclose sensitive

    information.Suppose a bank is trying to find who is responsible for missing money

    (e.g., fraud or embezzlement). They have gathered information on suspectemployees and customers. Even though much of the information is publiclyknown (name, size of mortgage, age, postal code, ...), simply revealing whois being investigated is sensitive: The appearance that you are accusing acustomer of fraud could well lead to a libel suit. Therefore they have en-crypted each of the values using an order-preserving encryption scheme. Arethey protected?

    The answer is probably not. Assume a newspaper wants to know if an in-

    dividual Chris is being investigated. They obtain the encrypted database.They know that the name Chris would rank at about 15% of all names so if it appears in the encrypted database, it will be roughly in that position(the range for a given sample size and probability can be calculated usingorder statistics). The newspaper can do the same with size of mortgage, age,and other known data about Chris and with the other employees/customersof the bank. If there is a tuple in the database whose rank on all attributes isclose to the corresponding rank of Chris (in the overall dataset), and there isno other tuple among the customers/employees whose ranks are similar, thenthe newspaper knows that with high probability, Chris is under investigation.

    The key problem is that while encrypting a single value using order pre-

    serving encryption or a binning scheme may reveal little information, sup-porting multiple index keys for each tuple reveals a surprising amount. Toprotect against such nave misuse of order-preserving, homomorphic, or othersuch encryption techniques, we propose definitions for what it means for anencrypted database to be secure.

    2

  • 7/31/2019 Security Issues in Querying Encrypted Data

    3/28

    This paper presents a vision for how research enabling a secure database

    service should proceed: Establish solid definitions of secure, develop en-cryption and query processing techniques that meet those definitions, demon-strate that such techniques have practical promise. We start with definitionsof security from the Cryptography community that have withstood the testof time. To what extent can we apply these definitions to a secure DBMS,enabling a proof of security? Section 3 gives security definitions for databaseand query indistinguishability based on the cryptographic concept ofmessageindistinguishability. This leads to a troubling result: prior work in cryptog-raphy shows that a secure DBMS server meeting these definitions requiresthat the cost of every query be linear in the size of the database, making asecure DBMS impractical for real-world use.

    Section 4 begins the real contribution of this paper: a slightly relaxeddefinition that gives probabilistic guarantees of security. For the data itself,security is equivalent to strong cryptographic definitions. An adversary trac-ing query execution could conceivably infer information over many queries,but the quality of the information decays exponentially by the time enoughqueries have been seen to infer anything sensitive, the relationship betweenthe early and later queries will have been broken, so the adversary will beunable to infer sensitive data.

    In Section 4.2 we show that for this definition, a secure DBMS serverwith reasonable performance could be constructed. The one caveat is that

    it requires the existence of a secure execution module: a way of runningprograms on the server that are hidden from the server. We show how basicquery processing operations (select, join, indexed search) can be implementedwith a simple secure execution module supporting encryption, decryption,pseudo-random number generation, and comparison; and give sketches of howthe operations could be proven to meet our definition of security. This paperaddresses read-only queries (select-project-join); extension to insert/updateis reasonable, but beyond the scope of the current work.

    How do we get such a secure execution model? Program obfuscation, pro-viding a program whose execution reveals no information to the server, hasbeen proven impossible for several classes of program[6]. We provide evi-

    dence that if possible at all it is unlikely to be efficient for a secure DBMS.Fortunately, there is another solution, implementing the secure executionmodule in tamper-proof hardware. Such hardware exists; one example meet-ing our requirements is IBMs [12]. Section 5 shows how to use such a moduleto implement a system safe from software-driven attacks; the specifications

    3

  • 7/31/2019 Security Issues in Querying Encrypted Data

    4/28

    and evaluation of the hardware tell us the protection provided against elec-

    tronic/physical attacks.

    2 Background and Definitions from Cryptog-

    raphy

    The cryptography community has developed solid and well-regarded defini-tions for securely encrypting a message. The result of this research is thatsecure encryption should hide any partial information about the data: oneshould not be able to distinguish between encryptions of two different mes-sages of the same length. Consider the following scenario where a stock trader

    sends buy or sell messages for a particular stock. An adversary knows mes-sages are either buy or sell, but indistinguishability guarantees the adversarywill have no clue which one was sent.

    A formal definition of indistinguishable private key encryption, from [8],is:

    Definition 2.1 Indistinguishability of encryptions[8]An encryption scheme with efficient key generation algorithm G, efficientencryption algorithm E and decryption algorithm D where Dk(Ek(x)) = x

    for secret key k has indistinguishable private key encryption if for everypolynomial-size circuit family {Cn}, every polynomial p, all sufficiently large

    n and every a, b 0, 1poly(n) with equal length.

    |P r{Cn(EG(1n)(a)) = 1}

    P r{Cn(EG(1n)(b)) = 1}| , ..., the second< 1, n >, < 2, n 1 >, .... If the encryption preserves order, we can alwaysdistinguish between the two sequences with probability 1, as the tuples sortthe same on both attributes in database 1, and sort in opposite order on thetwo attributes in database 2. Similar arguments can be used for encryptionthat preserves difference between the messages. While data values may be

    protected from direct inspection, a determined server with some additionalknowledge (e.g., a history of queries) may be able to significantly compromisesecurity.

    Careful definitions of the system environment can be secure using thesemethods. A method for order-preserving encryption that hides distributions,and an architecture for its use, is presented in [3]. The key is in their as-sumptions: The database software is trusted (the adversary only has accessto the encrypted database, not the running system), and only one attributein each relation uses order-preserving encryption. While secure within theassumptions, it does not meet our goal of a secure database service.

    A key distinction between our work and that described above is that we

    address what the server learns from processing queries. Statistical databasework has shown that a sequence of queries can reveal information beyond thatrevealed by any single query[1]; this is just as true for queries on encrypteddata. Private Information Retrieval has addressed this issue [7], but underthe assumption that the data is known to the server and the query must be

    24

  • 7/31/2019 Security Issues in Querying Encrypted Data

    25/28

    kept private. The results are discouraging; the data access lower bound for

    single server is the entire database. However, by encrypting both data andqueries, we can obtain reasonable levels of security and avoid the impossibilityresults of Private Information Retrieval.

    7 Conclusions

    The idea of a database server operating on encrypted data is a nice one: Itopens up new business models, protects against unauthorized access, allowsremote database services, etc. Achieving this vision requires compromisesbetween security and efficiency. We have shown that a server that would

    be considered secure by the cryptography community would be hopelesslyinefficient by standards of the database community. Efficient methods (e.g.,operations on encrypted data) can not meet cryptographic standards of se-curity.

    We have given a definition of security that is the best that can be achievedwhile maintaining reasonable levels of performance. We have shown thatthis definition can be realized using commercially available special-purposehardware.

    This definition and approach raises many questions. The first is real-world performance: What happens if we implement such a system? Weplan to pursue such an implementation. This will lead to many challenges:

    More efficient join and indexing strategies that meet security requirements,concurrency control that does not violate security, query optimization ap-proaches, etc. A second issue is when the security offered by Definition 4.4 isinadequate, and only (inefficient) approaches meeting Definitions 3.2-3.3 areadequate. While we have shown that our approach is the best we can (effi-ciently) do, the disclosures may be too much for some applications. Progressin this area will require better definitions of security; e.g., privacy definitionsas rigorous as the security definitions that enabled progress in multilevelsecure databases[16].

    In spite of these questions, a database server managing encrypted data is

    feasible. Such a server will provide substantial benefit, such as easing enforce-ment of several of the ten principles proposed for a Hippocratic Databasesystem[2]. With careful and rigorous work on ensuring that security isachieved, we can expect to see significant progress in this area. Key areasfor future work are:

    25

  • 7/31/2019 Security Issues in Querying Encrypted Data

    26/28

    Formal proof that Definition 4.3 adequately protects against inference

    from tracking query accesses, Formal proof that the Definitions given are complete and consistent,

    Methods for proving that query processing algorithms meet securitydefinitions,

    Query optimization methods that provide provably secure means ofcombining the various component algorithms,

    Practical efficiency issues: query and data pipelining in conjunctionwith encryption (this will demand implementation on real hardware),

    and

    System issues: How does this play out in real-world applications?

    We believe a new and important research direction is taking off; this papershows how with careful definition of what it means to be secure, such researchcan ensure the promised security while achieving practical efficiency.

    References

    [1] N. R. Adam and J. C. Wortmann, Security-control methods

    for statistical databases: A comparative study, ACM ComputingSurveys, vol. 21, no. 4, pp. 515556, Dec. 1989. [Online]. Available:http://doi.acm.org/10.1145/76894.76895

    [2] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, Hippocraticdatabases, in Proceedings of the 28th International Conference on VeryLarge Databases, Hong Kong, Aug. 20-23 2002, pp. 143154. [Online].Available: http://www.vldb.org/conf/2002/S05P02.pdf

    [3] , Order-preserving encryption for numeric data, in Proceedings ofthe 2004 ACM SIGMOD International Conference on Management of

    Data, Paris, France, June 13-18 2004.

    [4] N. Ahituv, Y. Lapid, and S. Neumann, Processing encrypted data,Communications of the ACM, vol. 20, no. 9, pp. 777780, Sept. 1987.[Online]. Available: http://doi.acm.org/10.1145/30401.30404

    26

  • 7/31/2019 Security Issues in Querying Encrypted Data

    27/28

    [5] D. Asonov and J.-C. Freytag, Almost optimal private infor-

    mation retrieval, in Second International Workshop on PrivacyEnhancing Technologies PET 2002. San Francisco, CA, USA:Springer-Verlag, Apr. 14-15 2002, pp. 209223. [Online]. Avail-able: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&vo%lume=2482&spage=209

    [6] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai,S. P. Vadhan, and K. Yang, On the (im)possibility of obfuscatingprograms, in Proceedings of the 21st Annual International CryptologyConference on Advances in Cryptology (CRYPTO 01), J. Kilian, Ed.Santa Barbara, California: Springer-Verlag, Aug. 19-23 2001, pp. 118.

    [Online]. Available: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&vo%lume=2139&spage=1

    [7] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, Privateinformation retrieval, Journal of the ACM, vol. 45, no. 6, pp. 965981,1998. [Online]. Available: http://doi.acm.org/10.1145/293347.293350

    [8] O. Goldreich, The Foundations of Cryptography. Cambridge UniversityPress, 2004, vol. 2, ch. Encryption Schemes. [Online]. Available:http://www.wisdom.weizmann.ac.il/ oded/PSBookFrag/enc.ps

    [9] , The Foundations of Cryptography. Cambridge University Press,

    2004, vol. 2, ch. General Cryptographic Protocols. [Online]. Available:http://www.wisdom.weizmann.ac.il/ oded/PSBookFrag/prot.ps

    [10] O. Goldreich and R. Ostrovsky, Software protection and simulation onoblivious RAMs, Journal of the ACM, vol. 43, no. 3, pp. 431473, May1996. [Online]. Available: http://doi.acm.org/10.1145/233551.233553

    [11] H. Hacigumus, B. R. Iyer, C. Li, and S. Mehrotra, Executing SQL overencrypted data in the database-service-provider model, in Proceedingsof the 2002 ACM SIGMOD International Conference on Managementof Data, Madison, Wisconsin, June 4-6 2002, pp. 216227. [Online].

    Available: http://doi.acm.org/10.1145/564691.564717

    [12] IBM PCI cryptographic coprocessor. [Online]. Available: http://www.ibm.com/security/cryptocards/html/pcicc.shtml

    27

  • 7/31/2019 Security Issues in Querying Encrypted Data

    28/28

    [13] CCA API performance - IBM PCI cryptographic coprocessor.

    [Online]. Available: http://www.ibm.com/security/cryptocards/html/perfcca.shtml

    [14] P. Lin and K. S. Candan, Hiding traversal of tree structureddata from untrusted data stores, in Proceedings of Intelligenceand Security Informatics: First NSF/NIJ Symposium ISI 2003,Tucson, AZ, USA, June 2-3 2003, p. 385. [Online]. Avail-able: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&vo%lume=2665&spage=385

    [15] G. Ozsoyoglu, D. A. Singer, and S. S. Chung, Anti-tamper databases:Querying encrypted databases, in Proceedings of the 17th AnnualIFIP WG 11.3 Working Conference on Database and ApplicationsSecurity, Estes Park, Colorado, Aug. 4-6 2003. [Online]. Available:http://art.cwru.edu/TOpapers/IFIP2003.Security.pdf

    [16] B. Thuraisingham and W. Ford, Security constraint processing ina multilevel secure distributed database management system, IEEETrans. Knowledge Data Eng., vol. 7, no. 2, Apr. 1995.

    [17] TCG TPM specification version 1.2, Nov. 5 2003. [Online]. Available:https://www.trustedcomputinggroup.org

    28